Lisa Dunlap
@lisabdunlap
messin around with model evals @berkeley_ai | prev @lmarena_ai
🎬 AI Hot Take 🔥 "What makes a bad AI product is a product that's too focused on AI." - UC Berkeley Professor @profjoeyg 👀 Watch the full mini-documentary here! youtu.be/MyjT2nBpbE8
Of all the coding assistants I use, Gemini is by far the cheekiest. I said it was querying the wrong key and it replied: “You’re absolutely right. The devil’s in the details—and in this case, the detail is the capitalization.”
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…
Congrats Elon for snagging such a good dev
Late, but big life update: I deferred my PhD and I started @xai a month ago as Member of Technical Staff, working on Post-training and Reasoning for @grok! I'm grateful for the chance to work alongside brilliant minds to solve the toughest problems standing between us and 🤖…
looking through some arena battles and came across this gem

Our lab at Stanford usually do research in AI & robotics, but very occasionally we indulge in being functional alcoholics -- Recently we hosted a lab cocktail night, and created drinks with research-related puns like 'reviewer#2' and 'make 6 figures', sharing the full recipes…
MegaSaM got an award! Big congrats to the team!!!!! 🥳🥳🎉🎉 @zhengqi_li, Richard, @forrestercole2, @jin_linyi, @QianqianWang5, Vickie @akanazawa, @Jimantha
Incredibly late to the game but if you are @CVPR come check out our poster on the vision arena (353) right now!
Exciting news - Chatbot Arena now supports image uploads📸 Challenge GPT-4o, Gemini, Claude, and LLaVA with your toughest questions. Plot to code, VQA, story telling, you name it. Let's get creative and have fun! Leaderboard coming soon. Credits to builders @chrischou03…
I’m going to give a talk at AI4CC workshop in #CVPR2025 at 2 pm CDT, regarding the recent world model project we did, at the Karl F. Dean Ballroom (or Grand Ballroom) on the 4th floor, room A1. @CVPR ai4cc.net Come say hi if you want to discuss :)
At @CVPR ? Come see my talk on building evals which embrace the fuzziness of generative models at the EVAL-FoMo workshop today! This talk had everything - from Chatbot Arena to model vibes to designing UI's :P Details: June 11th, 4:30pm, room 210
✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at inference time. The result is flexible control: parameterize tasks as multimodal inputs, visually inspect the images with the VLM, and update the generator.🧵
We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵