Ross Taylor
@rosstaylor90
Universal intelligence at @GenReasoning. Previously lots of other things like: Llama 3/2, Galactica, Papers with Code.
Congrats @n_latysheva !
Introducing AlphaGenome: an AI model to help scientists better understand our DNA – the instruction manual for life 🧬 Researchers can now quickly predict what impact genetic changes could have - helping to generate new hypotheses and drive biological discoveries. ↓
What seems like an exponential in AI is just a series of S curves. Each era rides on a wave of increasing compute but finds a new way to utilise it - overcoming limitations of the previous stage. Eg pre-training was the dominant way to utilise compute, but the limitations of…
It’s funny that people on this site think major LLM efforts are talent-bound rather than org-bound. The talent differential has never been big between major orgs. Most of the difference in outcomes is due to organisational factors - like allocating compute to the right bets, and…
Nice work on prediction vs understanding.
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
If you take ASI seriously, then you care about where you want to build it and who you want to build it for.
Too many are being sanctimonious about human intelligence in face of the first real thinking machines. They'll be left behind like many who failed to understand technology in the past.
The rise of reasoning machines And a debate that doesn't warrant repeating. interconnects.ai/p/the-rise-of-…
Finally proof that a British accent makes you smarter.
Definitely weird stuff with "o1 pro" -Says it's o3 -It has access to memory tool -Can search the web It says "optimised" not "optimized" (Only o3 slips into British English) Feels like o3 pro @apples_jimmy @chatgpt21 @btibor91 @iruletheworldmo @scaling01 @kimmonismus @chetaslua
The best way to judge new results in ML is how much complexity they introduce for their stated performance gain. Most new things get small improvements for large complexity gains. They trade on novelty bias in the short term, and nerd-snipe people into thinking their approach is…
This is a nice thread by @MinqiJiang.
It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization. In the same week, we got two (awesome) papers: Absolute Zero Reasoner: Improvements on code…
RL is very expensive compared to SFT, which makes it impractical to scale for most folks outside of big labs. And yet, RL is perfect for businesses because you can optimise the metric you actually care about. Not the next token; but the next sale or the next customer. Already…
When your model has emergent swearing in its internal monologue.

Neural networks were once in the “graveyard of ideas” because the conditions weren’t right for them to shine (data, hardware). So maybe it’s a waiting room rather than a graveyard 🙂. I’m not sure a lot of the ideas below are dead actually - eg SSM-transformer hybrids look more…
Making a list of graveyard of ideas, the ultimate nerd snipes where efforts go and die DPO-*variant SSM-transformer hybrids SAEs MCTS Diffusion for large vision models Attention-less JEPA (lecun lovers) what else?
Happy Qwen day to all who celebrate.
Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…
All that is old is new again.
how sure are we that one epoch is optimal for pretraining in the data-scarce regime