Mir Miroyan
@mirmiroyan
cs phd @UCBerkeley sky lab | bair
why would we want LLMs to generate "buggy" code? if we want models to be better assistants, they need to better understand the user! if the model can act (e.g. code) like the user, we're a step closer. and what better place to explore this than in education?
Can LLMs write code and learn like novice programmers? We release ParaStudent, a framework to study how to make LLMs generate realistic, student-like code, which is often imperfect, iterative, and stylistically diverse 👩🎓 Paper and code shared in the thread 👇
incredible
The NEW LMArena is officially live! 🎉 ✨ New Logo! ⚡️ Better, faster UI/UX for chat and leaderboard 📱 Mobile optimized 💬 Chat history 🧭 Clearer leaderboard navigation 🤖 Many modalities in one place: vision, image, and more coming soon Try it now at lmarena dot ai! (Link in…
🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n
a great improvement over the gradio UI!
We're excited to invite everyone to a new Beta version of LMArena! 🎉 For months, we’ve been poring through community feedback to improve the site—fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this…
excited to release the first checkpoint of the project. it's more than just a leaderboard -- we share some interesting findings in the LMArena blog (blog.lmarena.ai/blog/2025/sear…)
Exciting News! Search Arena Leaderboard🌐 🥇 Gemini-2.5-Pro-Grounding and Perplexity-Sonar-Reasoning-Pro top the leaderboard! Congrats @GoogleDeepMind and @perplexity_ai! 📊 We've open-sourced 7k battles with user votes! 📝 Check out our blog post for detailed analysis. Blog…
We finally have a platform to evaluate and rank how AI uses tools (in particular search) in the wild. Try asking questions that require (and don't require search)! The results are really interesting. I also want to congratulate my students @mirmiroyan and @tsunghan_wu for…
News: Search Arena is now LIVE! 🌐🔍 ✅ Test web-augmented LLM systems on real-time, real-world tasks — retrieval, writing, debugging & more. ✅ Perplexity, Gemini, OpenAI go head-to-head. ✅ Crowd-powered evals. Leaderboard 🏆 coming soon… ⚡Try it now at lmarena .ai!
News: Search Arena is now LIVE! 🌐🔍 ✅ Test web-augmented LLM systems on real-time, real-world tasks — retrieval, writing, debugging & more. ✅ Perplexity, Gemini, OpenAI go head-to-head. ✅ Crowd-powered evals. Leaderboard 🏆 coming soon… ⚡Try it now at lmarena .ai!