Gokul Swamy
@g_k_swamy
phd candidate @CMU_Robotics. bs/ms @berkeley_ai. summers @GoogleAI, @msftresearch, @aurora_inno, @nvidia, @spacex. no model is an island. taking a break from x.
1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

This was crystal-clear without losing nuance — highly recommended!
like everyone else i am hopping on the blog post trend gene.ttic.edu/blog/incomplet…
Check out @nico_espinosa_d's blog post on how we can enable test-time scaling of policies learned via offline RL! I am particularly impressed by the figures :).
New in the #DeeperLearningBlog: Researchers from the #KempnerInstitute, @Cornell and @CMU_Robotics introduce a new method for improving offline RL by scaling-up test-time compute. kempnerinstitute.harvard.edu/research/deepe… #AI #RL (1/2)
Using OT to define rewards for imitating video demos is popular, but it breaks down when demos are temporally misaligned—a frequent challenge in practice. We present ORCA at #ICML2025 , which defines rewards by aligning sequences, rather than matching individual frames via OT.
Heading to Vancouver for #ICML2025 to present our work: Temporal Difference Flows. Make sure to check out the oral to learn how we’re now able to scale this exciting world model framework based on the successor representation! Also, feel free to reach out to discuss anything RL!
Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…
This blog post was a wonderful read -- it is a rare and precious opportunity to understand the "latent reasoning process" underneath a beautiful idea :)
This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work! I wrote about the story of its development, and what might be coming next. The H-Net: goombalab.github.io/blog/2025/hnet…
I'm very excited to share some new work arxiv.org/abs/2506.06488. This work started out in conversations with @thorn where we realized that shadow model MIAs couldn't be used to audit models for harmful content of children. See 🧵 for why, and our progress on solving this...
👋 I’ll be at EC 2025 @Stanford (July 7–10) presenting 2 papers at the Swap Regret & Strategic Learning workshop on July 10: 📄 arxiv.org/abs/2505.16141 📄 arxiv.org/abs/2504.15615 I’ll post more about the papers. Feel free to DM if you’re around and want to chat/grab coffee!☕
The culmination of several PhD years — today LAML is published! LAML infers max likelihood time-resolved cell lineage trees from dynamic lineage tracing data accurately and efficiently. Thanks to @benjraphael for his guidance! genomebiology.biomedcentral.com/articles/10.11…
Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation *without* requiring any paired human-robot data or online RL 🙂 1/n
Teleoperation is slow, expensive, and difficult to scale. So how can we train our robots instead? Introducing X-Sim: a real-to-sim-to-real framework that trains image-based policies 1) learned entirely in simulation 2) using rewards from human videos. portal-cornell.github.io/X-Sim
We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.