Martin Vechev

@mvechev

Professor of Computer Science, ETH Zurich. Founder of INSAIT (http://insait.ai). Works on Safe/Secure AI, LLMs, Quantum. Co-founder of 6 Deep-Tech start-ups.

Joined June 2012

26Following

2KFollowers

Pinned

Martin Vechev Retweeted

Luca Beurer-Kellner@lbeurerkellner · Apr 7

🔴 New MCP attack leaks WhatsApp messages via MCP, side-stepping WhatsApp security. 1/n We show a new MCP attack that leaks your WhatsApp messages if you are connected via WhatsApp MCP. Our attack uses a sleeper design, circumventing the need for user approval. More 👇

188

1.0K

214.0K

Martin Vechev@mvechev · Jul 22

Interesting approach! However, we looked at the proofs and methodology and we found a few problems, specifically with the use of hints given to the model. While the scaffold indeed improves performance, it does not solve all problems accurately and would not get a gold medal.🧵

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

148

31.0K

Martin Vechev Retweeted

Mislav Balunović@mbalunovic · Jul 14

We are launching Project Euler on MathArena to track performance of LLMs on challenging new problems at the intersection of mathematics and programming which are published every week on Project Euler website 🧵(1/6)

11.0K

Martin Vechev Retweeted

Jasper Dekoninck@j_dekoninck · Jul 11

As models are getting close to saturating our main automated benchmarks, we are currently looking towards more challenging competitions. Some very exciting updates coming up for that in the coming days and weeks, so stay tuned! (3/3)

385

Martin Vechev Retweeted

Jasper Dekoninck@j_dekoninck · Jul 11

On the SMT, a competition of 53 questions that is currently kept private, Grok-4 also convinces, but is not outperforming o4-mini and o3. (2/3)

365

Martin Vechev Retweeted

Jasper Dekoninck@j_dekoninck · Jul 11

Grok-4 takes first place on the MathArena Leaderboard! Convincing scores across the board, with an especially impressive performance on HMMT 2025. Full results are available on matharena.ai. (1/3)

667

Martin Vechev Retweeted

Mark Müller@mnmueller · Jul 8

🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code? We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: insights.logicstar.ai Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵

487

Martin Vechev Retweeted

INSAIT Institute@INSAITinstitute · Jul 1

🤝We are delighted to announce that INSAIT is starting a joint research program with the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), one of the world’s leading and most influential research labs! 🚀All details оn the joint program will be announced…

821

Martin Vechev Retweeted

INSAIT Institute@INSAITinstitute · Jul 2

🌐 We are delighted to announce the launch of a new 1 million USD joint research program between INSAIT and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), one of the top research labs in the world! 🎓 The program enables incoming INSAIT tenure-track…

601

Martin Vechev Retweeted

Jasper Dekoninck@j_dekoninck · Jun 26

Thrilled to share a major step forward for AI for mathematical proof generation! We are releasing the Open Proof Corpus: the largest ever public collection of human-annotated LLM-generated math proofs, and a large-scale study over this dataset!

6.0K

Martin Vechev Retweeted

Nikola Jovanović@ni_jovanovic · Jun 23

There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image generation? Yes, but it's not straightforward 🧵(1/10)

323

259

47.0K

Martin Vechev Retweeted

Mislav Balunović@mbalunovic · Jun 2

Two updates from MathArena: - DeepSeek-R1-0528 shows strong performance very close to top closed source models on all competitions - We released a research paper about our evaluation methodology and more detailed analysis of results

4.0K

Martin Vechev Retweeted

António Costa@eucopresident · Apr 29

Inspiring visit to @INSAITinstitute at @SofiaTechPark, the first institute of its kind in Eastern Europe. Its cutting-edge technology will allow countries to quickly catch up and advance on the AI front. And the upcoming BRAIN++ AI Factory, part of the EU-wide AI hub network,…

101

11.0K

Martin Vechev Retweeted

INSAIT Institute@INSAITinstitute · Apr 29

🇪🇺 🇧🇬 Today, António Costa @eucopresident, visited INSAIT during his official visit to Bulgaria. The visit was also attended by Prime Minister of Bulgaria Rosen Zhelyazkov. Prof. @mvechev and Eng. Borislav Petrov presented Mr. Costa with the achievements of the institute, which…

765

Martin Vechev Retweeted

INSAIT Institute@INSAITinstitute · Apr 23

🚀 We are delighted to announce MamayLM, a new state-of-the-art efficient Ukrainian LLM! 📈 MamayLM surpasses all similar-sized models in both English and Ukrainian, while matching or overtaking up to 10x larger models. 📊 MamayLM is a 9B model that can run on a single GPU,…

3.0K

Martin Vechev Retweeted

Mislav Balunović@mbalunovic · Apr 5

After many requests, we’ve evaluated Grok 3 on the USAMO 2025. The results are in: Grok 3 is tied with DeepSeek-R1 for the second place, earning 4.76% of the total points!

347

168.0K

Martin Vechev Retweeted

Mislav Balunović@mbalunovic · Apr 2

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

146

999

178

300.0K

Martin Vechev@mvechev · Apr 1

Designing a network of interconnected agents and servers will be a security nightmare if we don't first fix prompt injections. Cool work and demos from @InvariantLabsAI

IInvariant Labs@InvariantLabsAI · Apr 1

🔴🔵 We have discovered a critical flaw in the widely-used Model Context Protocol (MCP) that enables a new form of LLM attack we term 'Tool Poisoning'. This vulnerability affects major platforms and agentic systems like OpenAI, Anthropic, Zapier, and Cursor. Full disclosure…

4.0K