Mark Müller

@mnmueller

PhD student at @the_sri_lab at @ETHZ

Zurich

Joined January 2021

33Following

66Followers

Mark Müller@mnmueller · Apr 11

We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!

NNiels Mündler@nielstron · Apr 11

🚨 New SWT-Bench Submission! 🤖 Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.

132

Mark Müller Retweeted

Niels Mündler@nielstron · Feb 18

SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE performance), only slightly outperforming SWE-agent. What is going on? We dug through the data to find a simple trick and achieve almost 30%! 👇🧵 1/9

235

Mark Müller Retweeted

LogicStar AI@logic_star_ai · Feb 17

We have our first submission for SWT-Bench 🚀 AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , significantly outperforming our SWE-Agent+ baseline and demonstrating the potential of dedicated test generation agents. 1/3 🧵

787

Mark Müller Retweeted

LogicStar AI@logic_star_ai · Dec 19

🚀 Introducing the SWT-Bench Leaderboard! Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖 Ready for the challenge? 👉 swtbench.com #AI #SoftwareTesting #SWTBench #CodeAgents

2.0K

Mark Müller@mnmueller · Dec 11

Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.

SSRI Lab@the_sri_lab · Dec 11

SRI Lab at #NeurIPS2024 - 1/8 SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev) ⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406 📝 We explore software…

Mark Müller@mnmueller · Nov 15

Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!

OOfir Press@OfirPress · Nov 14

Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark! 🧵(1/3)

Mark Müller Retweeted

Niels Mündler@nielstron · Jul 26, 2024

Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️ "Code Agents are State of The Art Software Testers" SWE-Agent, aider and co are competent at reproducing GitHub issues, performing as well as specialized methods. Looking forward to answer your questions!

1.0K

Mark Müller Retweeted

Marc Fischer@marc_r_fischer · Jul 21, 2024

On Tuesday at 11:30, in Poster Session 1, we will present Prompt Sketching, a novel decoder-driven approach for templated (and constrained) text generation of LLMs. 📄 arxiv.org/abs/2311.04954 👨‍💻 Work with @mnmueller, @lbeurerkellner, @mvechev.

Mark Müller@mnmueller · May 7, 2024

Excited to share our latest work which we will present today at @iclr_conf

SSRI Lab@the_sri_lab · May 7, 2024

We show that neural network certification with all commonly used convex relaxations is imprecise for any NN expressing interesting (>1-d inputs) functions and discuss implications for cert. training. 🧑‍🔬 Maximilian Baader, @mnmueller, @MaoYuhao91443 📄 arxiv.org/abs/2311.04015

Mark Müller Retweeted

Martin Vechev@mvechev · Apr 30, 2024

A couple of amazing PhD students graduated from our lab (@the_sri_lab) at ETH Zurich today: @mbalunovic and @mnmueller. Both did fantastic contributions to the area of Safe and Secure AI: impactful papers and systems the community built upon. Next steps should be exciting :)

2.0K

Mark Müller Retweeted

SRI Lab@the_sri_lab · Dec 13, 2023

Find us @NeurIPSConf #NeurIPS2023 to chat about our latest work. We are excited to share works on certified robustness, a large scale study of image classifiers, and game theory. All works are supervised by @mvechev. 🧵

878

Mark Müller Retweeted

SRI Lab@the_sri_lab · Jun 21, 2023

@mnmueller and @marc_r_fischer introduced a new from of Abstract Interpretation for challenging unbounded loops enabling the analysis of fixpoint-based neural network architecture (monDEQs). 🌐 sri.inf.ethz.ch/publications/m… 📄 arxiv.org/abs/2110.08260 🧵 3/3

148

Mark Müller@mnmueller · May 29, 2023

Super excited to talk about robustness guarantees for neural networks at @mlsec_lab's seminar!

MMachine Learning Security Laboratory@mlsec_lab · May 29, 2023

We are excited to present a new event in our seminar series on ML Security! We will host Mark Müller (ETH Zurich) on June 6, 2023, at 15:00 CEST. Free registration: eventbrite.com/e/machine-lear… @adversarial_ML @trustworthy_ml @aivillage_dc @RedTeamVillage_

186

Mark Müller Retweeted

SRI Lab@the_sri_lab · May 4, 2023

At @iclr_conf members of SRI lab presented 3 works: - ⚖️ Human-Guided Fair Classification for NLP - 📈 Robustness Verification & Training of Neural ODEs - 📦 Certified Training: Small Boxes are All You Need Find us around the workshops! 🧵 1/4

580