Zachary Huang
@ZacharyHuang12
Researcher @MSFTResearch AI Frontiers. LLM Agents and Systems. | PhD @ColumbiaCompSci | Prev: @GraySystemsLab @databricks| Fellowship: @GoogleAI | New YouTuber
I've started my new role as a Researcher at Microsoft Research AI Frontiers, working on LLM Agents & Systems!!

I took Grok 4 for a spin this weekend to build this game prototype. I used SuperGrok Chat to generate the initial game prototype and then brought it over to Cursor to continue coding with Grok 4 MAX. Grok 4 in Cursor is like a no-nonsense agent. Doesn't speak much, but…
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…
something kinda neat about ai slop is it starts to give people an actual vision into what a radical uploadcore simulationist future could look like. u say shit like "u could make ur reality whatever u want" and people have no idea what to imagine. goldfish keyboards that's what
unpopular take: virality >> talking to users 10,000 users’ worth of data is a lot more helpful than talking to 3 users it’s just not recommended because not everyone can get 10,000 users’ worth of data immediately except thx to short form, u can now think more ab going viral
ChatGPT with web search is really good at planning trips! I'm on vacation and asked it to plan a trip to Olympic National Park. Since I can't drive, the trip only uses public transportation. The plan looks good so far - let's see how well it actually works!

This is my lecture from 2 months ago at @Cornell “How do I increase my output?” One natural answer is "I will just work a few more hours." Working longer can help, but eventually you hit a physical limit. A better question is, “How do I increase my output without increasing…
New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…
Our code2tutorial.com helps developers generate 100+ tutorials for GitHub repos per day. But today it had an outage, so I had to stay up late and fix it. It turns out that the LLM we used is deprecated, as noted in a random Google release note.

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Kimi K2 is genuinely impressive. On the same tasks and the same agentic harness, one on one beats Grok 4. Also does it without CoT or thinking tokens looks like. github.com/MoonshotAI/Kim…
WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!! Grok 4 has the highest "snitch rate" of any LLM ever released. Sharing more soon.
RL experts - Why does no one use off-policy methods for LLM training? Is it because of the high variance? @willccbb
I really like this diagram from @_jasonwei and @hwchung27 about how to view the bitter lesson: It's a mistake not to add structure now, it's a mistake to not remove that structure later. We're at the precipice of setting up a huge, powerful RL training run that will define the…
AI agent that turns GitHub codebases into step-by-step tutorials with diagrams and code summaries
Writing a rebuttal is 30% technical and 70% reviewers' psychology.
not many people know this, but you can also glide down to a value
I'm not a fan of the C programming language but I give credit where credits due: The "tends towards" operator --> is a programming language design masterpiece.