Dan Advantage
@DanAdvantage
Uncursing RL! Puffer: http://puffer.ai Paper on Foundational Work: Playing Pokémon Red via RL Blog and Live Training Map: Beating Pokemon Red w/ RL ⬇️details⬇️
We beat Pokemon Red with online RL! Details in links
We beat Pokemon Red with online RL! Details here over the next several days. Led by @dsrubinstein. Follow him, me, @DanAdvantage, @kywch500, @computerender for more!
i lowkey want to develop an actual llm benchmark that real people care about. just llms doing things we want like "don't break the code" and "vectorize this backend" and "i need the red element placed over the blue element with 80% transparency" and "make a feature-complete mmo."
Relevant because writing performant rl chess
Once I played Magnus Carlsen and he fell asleep
FPV you are at OAI in 2017, doing RL...
x.com/i/article/1946…
Peter's project is really cool. Had a sneak preview months ago and from this teaser video it is clear he has leveled up significantly!
I’m giving a talk on my new project next month!
All open-source. Real performant, really simple stack. Real people to assist with onboarding (most with programming experience just 0-shot setup and require no assistance). Fixing RL.
i LOVE PUFFER DOT AI
Assembly is meta-binary Programming is meta-assembly Vibe-coding is meta-programming Intent-modeling is meta-vibe-coding Which brings us back to 1996 at Microsoft
This is advice
If I had to start over today with $0: - I'd build a simple app solving ONE specific problem - Launch it in 3 weeks, imperfections and all - Document everything on Twitter (X) and TikTok - Optimize onboarding until conversions are maxxed out - Hard paywall from day one Most…
On the wait list
We are launching our first product milestone on the path to superintelligence. We believe comprehension is at the root of the problem. Asimov: the best-in-class code research agent, built for teams and organizations.
Um, yeah, this seems like something I've worked on!
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
Vibe coders challenge:
So my coworkers have a problem: The B32, a bus that goes between Queens and Brooklyn is super flaky. The MTA provides a real-time api for bus locations. I figured, this seems like a good task for some automation and to try out vibe coding.
Is Grok really just 'neurodivergent?' Good at math, hyper-focused on a seemingly arbitrary set of things, frequently misunderstands context... I asked Grok (which swears up and down it has 'no version') to summarize my profile. After doing the wrong thing, it determined I wanted…


A little something for everyone in this succinct treatise.
x.com/i/article/1941…
Okay, so it's a cursor issue? Help them iron out the bugs?
Please fix the Cursor-Grok communication flow
I guess this is probably some kind of low-key grok alignment?
WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!! Grok 4 has the highest "snitch rate" of any LLM ever released. Sharing more soon.
A little project I'm doing. Grok-3-mini is trying to become Champion. Streaming on Twitch here and there Grok_Plays_Pokemon

I think this is worth some attention.
If only I had verifiable environments, I could use @willccbb's stuff