Justin Zhao
@justinxzhao
AI Research Engineer @ Meta Superintelligence Labs Past: ML Lead @ Predibase, Research Engineer @ Google, CS+Music @ Columbia
In a world with AI, doing isn’t the hard part anymore. The hard part is trusting. Reviewing. Verifying. Embracing. Deciding what matters. These are the bottlenecks now, and they are deeply human. We talk about AI agents accelerating science and automating research, and they…
I’m slowly beginning to accept that my productivity, when working with AI coding agents, is limited by my human brain. AI can do many tasks in parallel, but I can only track the context of a few, so I only run a few tasks at a time. I am the bottleneck.
Kimi-K2 just took top spot on both EQ-Bench3 and Creative Writing! Another win for open models. Incredible job @Kimi_Moonshot
Big sister energy. Congratulations!
We're pleased to welcome Jinger Zhao to the Society for Science Board of Trustees! societyforscience.org/press-release/…
Editing your reward function is like publishing an amendment to your model's constitution.
"AI as Normal Technology" knightcolumbia.org/content/ai-as-… also advocates the idea that the impact of superintelligence will be extremely gradual because knowing how to improve requires 1) implementing and 2) getting feedback from the real world, both of which are slow.
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…
🥳🥳
Big news! We will be joining @RubrikInc to accelerate agentic AI adoption from pilot to production at scale! ⚡️ Together, we can deliver radical simplicity in models and data. This is an exciting next step in our journey. More from @devvret_rishi here: pbase.ai/45yUL2O
Love the idea of presenting your work as someone else's as a way of getting past sycophancy, which seems to be getting worse these days. I suppose most LLM-as-a-Judge setups embody this inherently, presenting outputs for rating as those written by anonymous third parties.…
There are many moving pieces when turning a project into a machine learning conference paper, and best practices/nuances no one writes up. I made a comprehensive paper writing checklist for my mentees and am sharing a public version below - hopefully it's useful, esp for NeurIPS!
"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and…
🚨🚨🚨 @justinxzhao just presented our paper and is around #NAACL2025 if you want to have a chat!!
Language Model Council paper was accepted to NAACL main 🎉🎉 @florplaza22 @CurriedAmanda See you in New Mexico! #NAACL
Feel bad for all the people who actually write with em dashes — only to get accused of AI slop.
another installment of non-determinism evals with @justinxzhao ! we ran an experiment with claude, making 100 API calls per query to test consistency with numerical data like population figures, GDP, and measurements. results below were interesting
Give LLMs access to free entropy, and they will kinda make use of it. 💭
working with @justinxzhao again on some randomness evals. he had a great idea to add in a 64-bit random seed at the top of the prompt before simulating a coin flip / dice roll and it helps randomize the results a lot! (left is with random seed)
People think that automating jobs will be easy, but they're wrong. You can’t just ask the AI to do things. You need to understand what your employee is doing - instructions, evals, monitoring. You have to make the role legible. Only then can you know AI will do the job well.