Jessy Lin
@realJessyLin
PhD @Berkeley_AI, visiting researcher @AIatMeta. Interactive language agents 🤖 💬
I’ll be at #ICLR2025 this week! ✈️ A couple of things I’m excited about lately: 1) Real-time multimodal models: how do we post-train assistants for real-time (and real world) tasks beyond the chat box? 2) Continual learning and memory: to have models / agents that learn from…
The Bitter lesson does not say to not bother with methods research. It says to not bother with methods that are handcrafted datapoints in disguise.
💯 Can't wait for the second blog! This could be an important step towards making AI agents more "human-centered". We want AI agents to help users (safely ofc), yet solely optimizing for tasks wo "users" in the picture might not get us there, e.g., x.com/metr_evals/sta…
User simulators bridge RL with real-world interaction // jessylin.com/2025/07/10/use… How do we get the RL paradigm to work on tasks beyond math & code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve…
underrated idea to learn passively about people from everyday computer use - I think the natural extension is learning from *trajectories* of how people prefer to do things, which is hard to get from prompting / static user data otherwise
What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵
40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
Today, we are launching the first publicly available AI Scientist, via the FutureHouse Platform. Our AI Scientist agents can perform a wide variety of scientific tasks better than humans. By chaining them together, we've already started to discover new biology really fast. With…
New on Rising Tide, I break down 2 factors that will play a huge role in how much AI progress we see over the next couple years: verification & generalization. How well these go will determine if AI just gets super good at math & coding vs. mastering many domains. Post excerpts:
chatgpt memory is like the buzzfeed quiz of 2025
ChatGPT just got an INSANE new memory update. It remembers things about you between chats, in a sophisticated and intelligent way. Best prompt to try? “Tell me some unexpected things you remember about me”
We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵
1/ LLM agents can code—but can they ask clarifying questions? 🤖💬 Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
Fascinating interviews. I'm not sure humans will ever be "out of the loop" in math. Even if humans have no advantages in proving theorems, they are still going to matter in asking questions. Mathematics is not just about what is true, but also what is interesting - to humans!
8/ If these challenges are overcome, what then? One thing that all four mathematicians agreed on is that full automation of math research is possible in principle, although this would likely be preceded by a period of human-AI collaboration.
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
+1 to the key idea here - it's def important to iterate on algorithms with clean benchmarks like math+code with known reward functions, but almost every task we care about in the real world has a fuzzy / human-defined reward func. I'm interested to see how we'll end up applying…
i wrote a new essay called The Problem with Reasoners where i discuss why i doubt o1-like models will scale beyond narrow domains like math and coding (link below)
Using AI agents to help humans understand and audit complex AI systems — I'm really excited by the long-term vision Jacob and Sarah are working on here!
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…