Aryan Vichare
@aryanvichare10
member of technical staff @lmarena_ai | prev. @Berkeley_EECS @vercel @v0 @aisdk
Amazing that Claude 3.7 is gapping so hard. Great job @AnthropicAI ! WebDev Arena has been a high-signal eval in my experience. It's very easy to distinguish models in their ability to create great websites.
BREAKING: Claude 3.7 Sonnet claims the #1 spot in WebDev Arena with a +100 score jump 🚀 over Claude 3.5 Sonnet! 🔥 Huge congrats to @AnthropicAI on this incredible milestone! Have you tried Claude 3.7 Sonnet in the WebDev Arena yet? Test it now (link below)
It's genuinely mind-boggling how good models are getting at one-shotting complex visualizations from simple prompts Prompt: "two black holes colliding animation" This model perfectly implemented: – 2-body gravity simulation – Dynamic particle accretion disks – Collision +…
🚨 BIG NEWS 🚨 Search Arena is live with 7 top models with search capabilities ready for testing. Be sure to have the "Search" modality selected in the chat box, and get testing. 🌐 @xAi: Grok 4 @anthropic: Claude Opus 4 @perplexity: Sonar Pro High & Reasoning Pro High…
We’re delivering a bundle of polish to the LMArena experience, most of them inspired directly by your feedback 💬 Here’s a look at what’s new👇
Thoughts on Grok 4 results in LMArena Grok's API model is tied for #3 overall with style control-remember, style control is default now in LMArena. Without style control, it's #2 overall. In Math, its preliminary ranking is tied for #1, along with Minimax-M1, Gemini-2.5-pro, and…
🚨 Breaking News: Grok 4's result is now live! With 4k+ community votes, xAI’s Grok-4 tied for #3 overall in Text Arena — a huge leap from Grok-3. It scores Top-3 across all categories (#1 in Math, #2 in Coding, #3 in Hard Prompts). Detailed analysis in the thread 🧵
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
grok 4 is live on lmarena for everyone!
🚨 New contender enters the Arena: @xAI’s Grok-4 is live! Grok-4 debuts impressively at #1 across many hard benchmarks. Now it’s time to put it to the real-world test: challenge Grok-4 with your toughest prompts!
Can't wait to see Grok 4 Performance in WebDev Arena "Visualization of 2 Black Holes Colliding"

Life update: excited to share I’ve joined @lmarena_ai as Member of Technical Staff! Excited to advance the future of AI progress through open, human-centered evaluation alongside such a talented team

The NEW LMArena is officially live! 🎉 ✨ New Logo! ⚡️ Better, faster UI/UX for chat and leaderboard 📱 Mobile optimized 💬 Chat history 🧭 Clearer leaderboard navigation 🤖 Many modalities in one place: vision, image, and more coming soon Try it now at lmarena dot ai! (Link in…
✨ Introducing Refract - The Ad Engine. Create 100+ video ads for your business in seconds. Live demo below 👇
Today I'm launching my new company @GeneralAgentsCo and our first product. Introducing Ace: The First Realtime Computer Autopilot Ace is not a chatbot. Ace performs tasks for you. On your computer. Using your mouse and keyboard. At superhuman speeds!
Introducing our latest blog on WebDev Arena: A Live LLM Leaderboard for Web App Development! How does WebDev Arena work? Submit a prompt → Two LLMs battle it out → You vote on the better web app. Since launching in Dec 2024, we've gathered 100,000+ community votes evaluating…