Nithya Attaluri
@attaluri_nithya
research engineer @googledeepmind // @miteecs @mitcsail bs & meng ‘23 // views my own
Evaluating LLMs is one of the most critical and nuanced challenges in AI today. I am super excited to be co-organizing this workshop @NeurIPSConf to discuss the most pressing evaluation challenges. Details 👇
We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3
Proudly organized by - @BerivanISIK (Google) - @beyzaermis (Cohere) - @Diyi_Yang (Stanford) - @MariusHobbhahn (Apollo Research) - @attaluri_nithya (Google DeepMind) - @RishiBommasani (Stanford) - @YangjunR (U Toronto) 3/3
Featuring talks from experts in the field: - @esindurmusnlp (Anthropic) - @isabela_alb (Google DeepMind) - @_jasonwei (OpenAI) - @seo_minjoon (KAIST) - @natolambert (Allen Institute) - @orf_bnw (Google DeepMind) - @sanmikoyejo (Stanford) - @lxuechen (xAI) 2/3
Very excited to announce that I’ll be co-organizing a @NeurIPSConf workshop on LLM evals! Identifying shortcomings in model capabilities in a robust, scientific way is a critical part of model development. Looking forward to discussing ideas and hearing from some eval experts!
We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3
New coworkers just dropped 🎉 Welcome, everyone! Excited to work with you :)
Very excited to share that @windsurf_ai co-founders @_mohansolo & Douglas Chen, and some of their talented team have joined @GoogleDeepMind to help advance our work in agentic coding in Gemini. Welcome to our new team mates from Windsurf! theverge.com/openai/705999/…
Take a look at these metrics! Gemini 2.5 Pro leads in areas like AIDER, an editing benchmark. It continues to be awesome for code and reasoning
Simplex (@usesimplex) builds developer-first web agents that companies use to integrate with legacy portals. They're already in production, dispatching freight shipments, downloading customers’ invoices, and fetching websites’ internal APIs. ycombinator.com/launches/NbM-s… Congrats on…
Go vibe code with 2.5 Pro 🚀👩💻
🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude…
Yay! Go @19kaushiks @m__dehghani ♊️💙
power duo @19kaushiks @m__dehghani
Cats made from different food items with Gemini's Flash's native image gen 🧵
We’ve made millions of miles of memories over the past 15+ years, but today is special. We’re returning to where the journey began, gradually opening our doors to our first public riders in Mountain View, Los Altos, Palo Alto, and parts of Sunnyvale.
In case you missed it, Imagen 3 is really good.
Breaking news from Text-to-Image Arena! 🖼️✨ @GoogleDeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis👇
The golden rule of Gemini is that our naming is inversely correlated with model capability. On that note, you will be pleasantly surprised by 2.0 Flash Thinking Experimental with Apps 🤓
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
We’ve been *thinking* about how to improve model reasoning and explainability Introducing Gemini 2.0 Flash Thinking, an experimental model trained to think out loud, leading to stronger reasoning performance. Excited to get this first model into the hands of developers to try…
It seems that I am giving the commencement address at MIT in May 😳 news.mit.edu/2024/hank-gree…
Web-browsing agents allow us to tackle a brand-new class of problems with AI. Lots of potential in this space…
We've released an early version of Project Mariner to trusted testers. Project Mariner is an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction. As a research prototype, it’s able to understand and reason across information in…