Jack Bai
@jackbai_jkb
I scale RL+VLMs 🧑🍳 CS PhD @UofIllinois | Intern @MSFTResearch | Prev @Berkeley_ai.
🧵 1/7 Should AI agents "think more" or "do more"? 🤔 The current trend is to scale test-time compute, making agents generate longer reasoning traces. But what if that’s the wrong approach for interactive tasks? In our new work, we argue for a new scaling dimension: Test-Time…

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…
A late update: I am interning at Microsoft Research NYC this summer. Let's hang out and grab a coffee if you're at NYC!
After R1 was proposed, I have been thinking: is it a good thing that the reasoning trace keeps getting longer during the post-train phase? Since single-step RL tasks are often fully observable bandit problems, it makes sense that the model’s reasoning trace grows—the longer…
x.com/i/article/1933…
「 Agent, Test Time Interaction 」 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Scaling 的新维度,test time interaction,让 Agent 从 “多想” 到 “多做”。 作者提出 Test-Time Interaction (TTI),让 Agent 在一次 rollout…
🔥Unlocking New Paradigm for Test-Time Scaling of Agents! We introduce Test-Time Interaction (TTI), which scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976
Top 50 LLM Interview Questions. Looks like a great resource to learn LLM basics:
Excited to share that EmbodiedBench was selected for an Oral at ICML 2025! We recently added results for new models (InternVL3, Gemma3, Ovis2) and released a large agent trajectory dataset on 🤗: embodiedbench.github.io Try training and evaluating your MLLM for embodied agents!
🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation & manipulation? 🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents! 📄 Paper: arxiv.org/abs/2502.09560 🌐 Website & code: embodiedbench.github.io
🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to…