Jacob Phillips
@jacob_dphillips
Engineering Fellow @a16z, American Dynamism. prev ML @scale_AI, CTO @Themis_AI, AI + History @MIT
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.

Exciting news on @diodeinc published on Business Insider today. 1/ We raised capital! Over $14.5m, most recently in a $11.4m series A round led by @a16z (details below) 2/ We are working with fortune 100 companies and fast growing startups. If you are building hardware, we want…
We’re proud to announce our $11.4 million Series A, led by @a16z, with the continued support of @ycombinator, @caffeinatedcap and @BoxGroup. We are working with Fortune 100 companies and fast growing startups to design and manufacture their circuit boards faster than ever.
Conflicts are won not just by what we produce, but how fast we move it. Yet military logistics still run on spreadsheets and whiteboards. We're proud to continue our support of Rune as they build the logistics platform the military needs 🚚
I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi
Her brain went 6 hours without oxygen before they could operate. On New Year's Eve in 2024, a 3mm-wide clump of cells wedged itself into an artery in my grandmother's brain. She was in the passenger seat, next to her husband of 58 years, when she began to drool and lose motor…
RoboArena from @pranav_atreya -- real-world, scalable benchmarking for robots! Another step towards infrastructure for robot learning, similar to @lmarena_ai
I wrote a second piece on “How to Build ChatGPT for Robotics”, covering the history of robot data labeling, current best practices, and what the future holds for robots – across benchmarks, safety, red-teaming, and real-world deployment. a16z.com/how-to-build-c…
What will the learning environments of the future look like that train artificial super intelligence? In recent work at @scale_AI , we show that training systems that combine verifiable rewards with multi-agent interaction accelerate learning.
PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, too! Our latest release includes algorithmic breakthroughs, massively faster training, and 10 new environments. Live demos on our site. Volume on for trailer!
Want to tinker with robots but don't have one on hand? @jacob_dphillips on our team @a16z built MALLET, a simple toolkit for anyone to become a robotics researcher by letting VLMs control real robots. Check it out below 👇
Have you ever wondered if o4-mini could control a robot? Ever wanted to do robotics research, but didn't have any robots or GPUs? MALLET is a toolkit and benchmark for letting vision-language models like GPT-4o drive robots in the real-world. MALLET is built on top of…
who is building American Dynamism and will be @CVPR in Nashville next week? hit us up @oyhsu @jacob_dphillips @MillenAnand
Releasing updated data and datasets on @huggingface! Now compatible with @MLCommons Croissant metadata format. huggingface.co/datasets/jacob…
We’re entering a new era in robotics where generalized systems are starting to work in the real world, but researchers still don’t have good tools for understanding their data. That’s why I built ARES, an open-source platform for ingesting, annotating, and curating robotics data.
28 miles, 5k vertical feet of elevation gain, 4 tiny bass, 1 unknown skull




The recent Sonnet release actually showed a small regression on MMMU, a visual reasoning benchmark, despite large advances in long-context reasoning for agentic coding and AIME. Excited to see better embodied reasoning benchmarks in the future!
Feels like there’s more discussion lately around evaluation criteria for physical reasoning abilities of AI. Maybe an extension of evaluating visual reasoning, but likely something wholly different. “The people yearn for benchmarks” — @jacob_dphillips