sacha🥝
@alexUnder_sky
living in self-hatred
"Мы тратим слишком много энергии, потому что почти все мы живем, опережая время. Мы думаем о том, что случится на следующей неделе, совершенно забывая, что уже завтра утром можем просто не проснуться." Эд Харрис.
Hmm
the number of frontier AI researchers i interview that have not used ai is shocking. NOT EVEN THE MODELS THEY TRAIN. I talk about claude code running my experiments and they are surprised. this is a failure of their incentive systems not a knock on them but it is still shocking
Is he missing anything?
Learning GSPO proposed by Qwen team: fig 1. they propose to use sequence likelihood for importance sampling fig 2. but from the RL course by @svlevine, this is the original form of off-policy PG fig 3. per-token IS in (Dr) GRPO is an approximation of it Am I missing anything?
British people why do you choose a vpn instead of the age verification process? Sounds much more doable than another subscription
Great to see my writing from over 2 years ago still resonates enough to inspire a new team at GDM, and is even used verbatim in the job listing. 🥲
LLM Economist creates optimal tax policy <—> TaxCalcBench does your taxes AI Tax Civilization: Who is building this? arxiv.org/abs/2507.15815
1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:
The superb idea and paper! Bug fan!
1/ 🕵️ Algorithm discovery could lead to huge AI breakthroughs! But what is the best way to learn or discover new algorithms? I'm so excited to share our brand new @rl_conference paper which takes a step towards answering this! 🧵
Felt so bad on my side that I might have missed the talk. But it's here. Thank you sir
The Nash75 talks are on YouTube! Below is my talk "Game Theory for AI Agents" (link also gives all other talks on the side). youtube.com/watch?v=WO5xJI…
So you just become an adult and then empty and fill the dishwasher until you die? This is bullshit.
🚀 New preprint! 🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
If I am allowed to say this, I definitely recommend it
New podcast about multi-agent RL and our @mitpress textbook (marl-book.com), done during @RLDMDublin2025. Thanks Robin for putting it together!