Jack Bai

@jackbai_jkb

I scale RL+VLMs 🧑‍🍳 CS PhD @UofIllinois | Intern @MSFTResearch | Prev @Berkeley_ai.

Champaign, IL

Joined January 2024

150Following

699Followers

Pinned

Jack Bai@jackbai_jkb · Jun 10

🧵 1/7 Should AI agents "think more" or "do more"? 🤔 The current trend is to scale test-time compute, making agents generate longer reasoning traces. But what if that’s the wrong approach for interactive tasks? In our new work, we argue for a new scaling dimension: Test-Time…

jackbai_jkb's tweet image. 🧵 1/7 Should AI agents "think more" or "do more"? 🤔

The current trend is to scale test-time compute, making agents generate longer reasoning traces. But what if that’s the wrong approach for interactive tasks?

In our new work, we argue for a new scaling dimension: Test-Time…

13.0K

Jack Bai Retweeted

Kimi.ai@Kimi_Moonshot · Jul 22

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…

249

2.0K

430

80.0K

Jack Bai@jackbai_jkb · Jul 11

A late update: I am interning at Microsoft Research NYC this summer. Let's hang out and grab a coffee if you're at NYC!

885

Jack Bai@jackbai_jkb · Jun 12

After R1 was proposed, I have been thinking: is it a good thing that the reasoning trace keeps getting longer during the post-train phase? Since single-step RL tasks are often fully observable bandit problems, it makes sense that the model’s reasoning trace grows—the longer…

JJack Bai@jackbai_jkb · Jun 12

x.com/i/article/1933…

902

Jack Bai@jackbai_jkb · Jun 12

x.com/i/article/1933…

3.0K

Jack Bai Retweeted

马

马东锡 NLP@dongxi_nlp · Jun 11

「 Agent, Test Time Interaction 」 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Scaling 的新维度，test time interaction，让 Agent 从 “多想” 到 “多做”。作者提出 Test-Time Interaction (TTI)，让 Agent 在一次 rollout…

136

112

11.0K

Jack Bai Retweeted

Junhong Shen@JunhongShen1 · Jun 10

🔥Unlocking New Paradigm for Test-Time Scaling of Agents! We introduce Test-Time Interaction (TTI), which scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976

165

79.0K

Jack Bai Retweeted

elvis@omarsar0 · Jun 6

Top 50 LLM Interview Questions. Looks like a great resource to learn LLM basics:

391

3.0K

7.0K

345.0K

Jack Bai@jackbai_jkb · Jun 8

Excited to share that EmbodiedBench was selected for an Oral at ICML 2025! We recently added results for new models (InternVL3, Gemma3, Ovis2) and released a large agent trajectory dataset on 🤗: embodiedbench.github.io Try training and evaluating your MLLM for embodied agents!

RRui Yang@RuiYang70669025 · Feb 14

🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation & manipulation? 🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents! 📄 Paper: arxiv.org/abs/2502.09560 🌐 Website & code: embodiedbench.github.io

12.0K

Jack Bai Retweeted

Jason Weston@jaseweston · Jun 3

🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to…

109

526

397

81.0K