Deepak Nathani (@deepaknathani11)

Pinned

D

Deepak Nathani@deepaknathani11 · Feb 21

🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research…

RRoberta Raileanu@robertarail · Feb 21

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the…

3

23

77

16

14.0K

Pinned

D

Deepak Nathani@deepaknathani11 · Jul 8

MLGym has been accepted to #COLM2025! See you in Montreal 🇨🇦

DDeepak Nathani@deepaknathani11 · Feb 21

🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research…

1

3

73

6

6.0K

Deepak Nathani Retweeted

A

Alex Goldie@AlexDGoldie · Jul 24

1/ 🕵️ Algorithm discovery could lead to huge AI breakthroughs! But what is the best way to learn or discover new algorithms? I'm so excited to share our brand new @rl_conference paper which takes a step towards answering this! 🧵

2

34

199

143

15.0K

Deepak Nathani Retweeted

P

Partha Talukdar (✈️ ACL 25)@partha_p_t · Jul 23

@GoogleDeepMind India 🇮🇳 & Japan 🇯🇵 are looking for strong candidates in multilinguality, multicultural, & multimodality areas. RS Bangalore: job-boards.greenhouse.io/deepmind/jobs/… RS Tokyo: job-boards.greenhouse.io/deepmind/jobs/… RE Tokyo: job-boards.greenhouse.io/deepmind/jobs/…

2

24

149

78

64.0K

D

Deepak Nathani@deepaknathani11 · Jul 24

If you are interested in open-ended discovery, this is an amazing opportunity! @robertarail is great and you will have fun working on a challenging problem

RRoberta Raileanu@robertarail · Jul 24

I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…

1

39

7

7.0K

D

Deepak Nathani@deepaknathani11 · Jul 22

Excited to be at @IC2S2 #ic2s22025. I will be presenting this work at the plenary lightning talks (after keynotes) on Thursday, and in the poster session afterwards. Looking forward to making new friends :D If you are interested in culture and evaluation, let's chat!!!

SShaily ✈️ IC2S2 / ACL@shaily99 · Jun 9

🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11

1

3

19

1

1.0K

D

Deepak Nathani@deepaknathani11 · Jul 18

“Apple looses key AI leaders to Meta” I discovered this while doing the live demo of Reka Research 😂 Go watch the video and play with our agent

RReka@RekaAILabs · Jul 18

Reka Research is our AI agent that scours the web to answer your toughest questions. Ready to unlock its full potential? Learn directly from the team who built it!

0

3

27

2

2.0K

Deepak Nathani Retweeted

J

Johannes Oswald@oswaldjoh · Jul 17

We are hosting a student researcher this year at the Paradigms of Intelligence team at Google! Interested in working with @ninoscherrer and me on AGI, or whatever you think is the next big thing 🥰, please consider applying! docs.google.com/forms/u/2/d/e/…

4

28

328

306

37.0K

Deepak Nathani Retweeted

Y

Yoram Bachrach@yorambac · Jul 13

How should we rank generalist agents on a wide set of benchmarks and tasks? Honored to get the AAMAS best paper award for SCO, a scheme based on voting theory which minimizes the mistakes in predicting agent comparisons based on the evaluation data. arxiv.org/abs/2411.00119

1

7

53

6

4.0K

Deepak Nathani Retweeted

R

Reinforcement Learning & Video Games Workshop @RLC@rlvg2025 · Jul 11

We’re excited to announce our next speaker: Roberta Raileanu (@robertarail) from @GoogleDeepMind! Roberta will discuss NetHack: A Grand Challenge for RL and LLM Agents Alike. ⚔️ Join us on August 5th to learn how to develop agents capable of tackling open-ended environments!

2

5

72

9

3.0K

D

Deepak Nathani@deepaknathani11 · Jul 10

🎉🚀 Release Day 3: It’s #OpenSource time! We’re pleased to announce the open-sourcing of Reka Flash 3.1, our coding model achieving state-of-the-art performance on LiveCodebench and other benchmarks. 📈🔍 🛠️✨ We’re also opensourcing Reka Quant, our cutting-edge quantization…

RReka@RekaAILabs · Jul 10

📢 We are open sourcing ⚡Reka Flash 3.1⚡ and 🗜️Reka Quant🗜️. Reka Flash 3.1 is a much improved version of Reka Flash 3 that stands out on coding due to significant advances in our RL stack. 👩‍💻👨‍💻 Reka Quant is our state-of-the-art quantization technology. It achieves…

0

1

2

0

69

D

Deepak Nathani@deepaknathani11 · Jul 8

Try out Reka vision today! You can create super cool social media reels with natural language prompts, search across many videos and get summaries. Congrats to the team for the great work.

RReka@RekaAILabs · Jul 8

Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions.

0

1

25

3

884

D

Deepak Nathani@deepaknathani11 · Jul 8

🥳Our work UTGen & UTDebug on teaching LLMs to generate effective unit tests & improve code debugging/generation has been accepted to @COLM_conf #COLM2025! Stay tuned for more exciting results -- e.g., using 32B-scale UTGen models to improve debugging with frontier models like…

AArchiki Prasad@ArchikiPrasad · Feb 4

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

8

25

89

19

7.0K

D

Deepak Nathani@deepaknathani11 · Jul 7

Solid work from @AIatMeta on ablating and improving AIDE on MLE-Bench! The rigor of empirical evaluation has reached a new level, making the experimental signals super strong. Highly recommended for anyone interested in AI-Driven R&D/Agentic Search!

YYoram Bachrach@yorambac · Jul 7

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

0

7

42

11

3.0K

D

Deepak Nathani@deepaknathani11 · Jul 8

Agent S2 is accepted to #COLM2025! My first COLM paper. See you in Montreal in October! 🍁

XXin Eric Wang@xwang_lk · Apr 2

Since launching Agent S2, many folks working on GUI/computer-use agents asked for our tech report. Here we go! 🎉New SOTA on 3 major computer use benchmarks. • OSWorld (15 steps): 27.0% 🚀 (+18.9%) • OSWorld (50 steps): 34.5% 🚀 (+32.7%) • WindowsAgentArena: 29.8% 🚀…

5

7

67

8

8.0K

D

Deepak Nathani@deepaknathani11 · Jul 7

Very proud of this work! If you're interested in AI agents and their current challenges, give this a read. Thanks to my incredible collaborators and to @Meta and @ucl for enabling me to tackle something of this scale for my first PhD paper. Excited for what's ahead!

MMartin Josifoski@MartinJosifoski · Jul 7

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape…

1

3

22

3

879

Deepak Nathani Retweeted

K

Kexun Zhang@kexun_zhang · Jul 7

Giving a talk at @jetbrains with @anton_iades on inference time scaling for SWE agents! Am I allowed to mention the word “cursor” here?

1

3

30

6

2.0K

D

Deepak Nathani@deepaknathani11 · Jul 7

AIRA strikes again! This time we conduct an in-depth study of research agents on MLE-Bench (i.e. kaggle competitions). We find that while exploration and search matter, the biggest delta is due to our more robust software stack. We are open-sourcing all of this to allow YOU to…

YYoram Bachrach@yorambac · Jul 7

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

0

5

56

27

6.0K

Deepak Nathani Retweeted

Y

Yoram Bachrach@yorambac · Jul 7

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

8

67

315

168

28.0K

D

Deepak Nathani@deepaknathani11 · Jul 7

Better Agents = Smarter Search Algorithms + Improved Operators + Precise Evaluation + Robust Infrastructure. For, even the best F1 driver can't win without a great car! Great to see my PhD research on LLMs + Search come together in this amazing paper! #LLM #Search #AI

MMartin Josifoski@MartinJosifoski · Jul 7

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape…

1

2

6

2

548