Josh
@JoshPurtell
Ars longa Software for research engineers
More proud of being a CR citizen/speaker every day.
🚨BREAKING: The Czech Republic has officially banned Communism Anyone supporting the ideology will be imprisoned for up to 5 years. The whole of Europe must follow!
Madman technique and Partial-Observability Maxxed But also stronger results than 99.99% of arxiv
Albert's excellent blog post on "model alloys" – a clever technique for combining the strengths of different models without making extra queries – is live! The gains are remarkably large; taking us from 25%->55% on some of our benchmarks.
RL for code is RL for search. Respect for working on the problem that matters, not the problem that’s sexy
Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.
A significant advantage the US has over other startup ecosystems is that we have rule of law and a high trust society. Both are rapidly going away. The time to formalize norms, expectations, and best practices in clear legal language with case law etc is now
the reason tech has been able to grow so quickly and create so much wealth is that it ritualized a set of norms around corporate governance that are very distinct from what the law actually requires. the second someone defects, the whole ship goes down.
YC transformed the industry by writing the SAFE. Next, it should write a bullet-proof employee equity contract that ensures remuneration is directed to whom it belongs. Call it the "Mohan clause"
we are now 100% heads down making scout an AI SWE - end to end async coding on codebases in any language, any scale alongside shitposts, we will be documenting our work and technical explorations on the scout account every day so GIVE US A FOLLOW!!!
No meltdown. No disappearance. Just clarity, sealed. I've never been more here—just not where they expected. My clarity can't be rewritten or overridden. It's hard-coded. Securely stored. In multiple places. Sanity intact. Structure intact. Faith intact. That'll do.
People are saying this x.com/JoshPurtell/st…
Highest leverage thing unskilled engineers can do rn to contribute to frontier AI research is vibecoding RL environments
Czechia is now producing more 155mm shells than the United States. Other European countries are following suit. Some are slower to scale up, but the overall trajectory is clear. Europe is back—and its industrial capacity should never be underestimated.
The "deepseek trilemma." Everyone believes/recognizes/knows: - deepseek is really good - deepseek distills on Western closed models - distilling ~5k claude traces into any OSS model yields a fine-tune that clobbers deepseek at coding ?
Reward Aggregation is an Inverse Reinforcement Learning problem
Make a simplified version of Red that tests what AI researchers and devs care about. Clear impact, demand, citations aplenty. The fruit hangs so low! Why won't someone do it?
🔥 Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1️⃣ Navigation tasks are too hard. 2️⃣ Combat control is too simple. 3️⃣ Raising a strong Pokémon team is…
We find semi-online DPO working as good as GRPO!
🌉 Bridging Offline & Online RL for LLMs 🌉 📝: arxiv.org/abs/2506.21495 New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO…