clhong1248
@CarinaLHong
Hiring: resume to [email protected] to love math is to see the face of God Morgan Prize, Rhodes Scholar Math PhD@Stanford; Neuro@Oxford; Math+Physics@MIT
[email protected] come join Axiom; help us figure out why prime numbers are hiding from their ex


I'm so pumped
Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across @Google. 🧵
Once upon a time, I was tired of emailing Word files back and forth, and so I built Writely (Google Docs). Today I’m tired of drowning in wildly conflicting takes on the most important issue of our era, so I'm launching the Golden Gate Institute for AI. (These are both team…
Elegance is expressiveness and simplicity combined -- achieving much with little. It's closely related to compression.
music.youtube.com/watch?v=svasMn… one of the saddest post rock melodies i've heard - mixed with sample from the last Columbia mission "Columbia, Houston: comm check" repeating 7x (the rocketship crashed). I listen to this every day I run; chew the pain of failure b4 any talk about success
I'm surprised OpenAI doesn't report o3/o4-mini's performance on Frontier Math. They commissioned the benchmark so you would think they'd want to use it, given how many benchmarks are in the post. This was the most impressive thing about the preview of O3, so I'm very curious
Here is Epoch's second AI+Math Chat, feat. Kevin Buzzard, Yang-Hui He, @AlexKontorovich, and @KristinLauter! Watch as we discuss the potential and obstructions for AI formalizing mathematics in Lean, and how LLMs learning the rules of arithmetic through pattern-matching might…
Rich Sutton just published his most important essay on AI since The Bitter Lesson: "Welcome to the Era of Experience" Sutton and his advisee Silver argue that the “era of human data,” dominated by supervised pre‑training and RL‑from‑human‑feedback, has hit diminishing returns;…
Recently, we were courting an extraordinary candidate, who was deciding between joining FutureHouse or taking a job at one of the frontier AI labs. He was looking for somewhere to do his life’s work, where he would be able to bet on himself. He recognized that he would have more…
Love this quote from Sir Michael Atiyah: "Algebra is the offer made by the devil to a mathematician. The devil says, 'I will give you this powerful machine...all you need to do is give up your soul: give up geometry...' The danger to our soul is there... when you pass into…
On the heels of Humanity's Last Exam, @scale_AI & @ai_risks have released a new very-hard reasoning eval: EnigmaEval: 1,184 multimodal puzzles so hard they take groups of humans many hours to days to solve. All top models score 0 on the Hard set, and <10% on the Normal set 🧵
Mathematics
Without mentioning drugs or alcohol, what is the best way to escape reality?
RL + CoT works great for DeepSeek-R1 & o1, but: 1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212
just published my full @arcprize analysis of deepseek's r1-zero and r1. link below. key points: r1-zero is more important than r1. both r1-zero and r1 score ~15% on ARC-AGI-1. this is fascinating. it matches deepseek's own benchmarking showing comprable results in logical…
DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393