Joseph Suarez (e/🐡)
@jsuarez5341
I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. https://puffer.ai
PufferLib 3.0: We trained reinforcement learning agents on 1 Petabyte / 12,000 years of data with 1 server. Now you can, too! Our latest release includes algorithmic breakthroughs, massively faster training, and 10 new environments. Live demos on our site. Volume on for trailer!
[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.
Reinforcement Learning Research Live x.com/i/broadcasts/1…
Reinforcement Learning Research Live x.com/i/broadcasts/1…
I propose we hold an IMO gold tiebreaker. The winner is whoever doesn't lobby for AI regulation to favor incumbents.
Reinforcement Learning Research Live x.com/i/broadcasts/1…
I post a lot about how good RL is with PufferLib that I'm realizing sounds increasingly grifty. Please just go try it. It's free. If you've done RL years ago, it will feel like a different field. We have new programmers doing RL on custom sims. That wasn't a thing before.
Kyoung is the most meticulously organized person I've ever worked with. Spun up in AI in around a year with a rare level of discipline
I recently started using the Apple Vision Pro (kind of late, right?) and was amazed at how well they nailed the eye-tracking. The AVP's eye-tracking calibration is performed against black, gray, and white backgrounds, and I happen to know from my previous life (i.e., during my…
I recently started using the Apple Vision Pro (kind of late, right?) and was amazed at how well they nailed the eye-tracking. The AVP's eye-tracking calibration is performed against black, gray, and white backgrounds, and I happen to know from my previous life (i.e., during my…
x.com/i/article/1946…
When life randomizes your size, mass, axial inertia, etc. just generalize super-human control!
Found the bug. Trains in 2 minutes. No collisions/randomization for first test but pretty zippy! Multi-task + domain randomized next
Reinforcement Learning Research Live x.com/i/broadcasts/1…
Reinforcement Learning Research Live x.com/i/broadcasts/1…
Reinforcement Learning Research Live x.com/i/broadcasts/1…
Peter is the original creator of Pokemon Red RL that beat the first gym. Go check out his latest project! Very cool alife
I’m giving a talk on my new project next month!
Reinforcement Learning Research Live x.com/i/broadcasts/1…
I will be working on RL for drone racing and swarms on stream here/YT/Twitch for the next few hours. Goal is a ~100k param multitask model that we can deploy on real hardware