Gokul Swamy

@g_k_swamy

phd candidate @CMU_Robotics. bs/ms @berkeley_ai. summers @GoogleAI, @msftresearch, @aurora_inno, @nvidia, @spacex. no model is an island. taking a break from x.

Joined December 2018

1KFollowing

4KFollowers

Pinned

Gokul Swamy@g_k_swamy · Mar 4

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

g_k_swamy's tweet image. 1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

236

2.0K

261.0K

Pinned

Gokul Swamy@g_k_swamy · Jul 22

This was crystal-clear without losing nuance — highly recommended!

GGene Li@geneli0 · Jul 21

like everyone else i am hopping on the blog post trend gene.ttic.edu/blog/incomplet…

1.0K

Pinned

Gokul Swamy@g_k_swamy · Jul 14

Check out @nico_espinosa_d's blog post on how we can enable test-time scaling of policies learned via offline RL! I am particularly impressed by the figures :).

KKempner Institute at Harvard University@KempnerInst · Jul 14

New in the #DeeperLearningBlog: Researchers from the #KempnerInstitute, @Cornell and @CMU_Robotics introduce a new method for improving offline RL by scaling-up test-time compute. kempnerinstitute.harvard.edu/research/deepe… #AI #RL (1/2)

1.0K

Gokul Swamy Retweeted

Yuki Wang@YukiWang_hw · Jul 15

Using OT to define rewards for imitating video demos is popular, but it breaks down when demos are temporally misaligned—a frequent challenge in practice. We present ORCA at #ICML2025 , which defines rewards by aligning sequences, rather than matching individual frames via OT.

7.0K

Gokul Swamy@g_k_swamy · Jul 12

Heading to Vancouver for #ICML2025 to present our work: Temporal Difference Flows. Make sure to check out the oral to learn how we’re now able to scale this exciting world model framework based on the successor representation! Also, feel free to reach out to discuss anything RL!

JJesse Farebrother@JesseFarebro · May 2

Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…

154

13.0K

Gokul Swamy@g_k_swamy · Jul 11

This blog post was a wonderful read -- it is a rare and precious opportunity to understand the "latent reasoning process" underneath a beautiful idea :)

AAlbert Gu@_albertgu · Jul 11

This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work! I wrote about the story of its development, and what might be coming next. The H-Net: goombalab.github.io/blog/2025/hnet…

828

Gokul Swamy Retweeted

Pratiksha Thaker@prthaker_ · Jul 9

I'm very excited to share some new work arxiv.org/abs/2506.06488. This work started out in conversations with @thorn where we realized that shadow model MIAs couldn't be used to audit models for harmful content of children. See 🧵 for why, and our progress on solving this...

4.0K

Gokul Swamy Retweeted

Jingwu Tang@jingwu_tang · Jul 3

👋 I’ll be at EC 2025 @Stanford (July 7–10) presenting 2 papers at the Swap Regret & Strategic Learning workshop on July 10: 📄 arxiv.org/abs/2505.16141 📄 arxiv.org/abs/2504.15615 I’ll post more about the papers. Feel free to DM if you’re around and want to chat/grab coffee!☕

1.0K

Gokul Swamy Retweeted

Gillian@gillianychu · Jul 2

The culmination of several PhD years — today LAML is published! LAML infers max likelihood time-resolved cell lineage trees from dynamic lineage tracing data accurately and efficiently. Thanks to @benjraphael for his guidance! genomebiology.biomedcentral.com/articles/10.11…

1.0K

Gokul Swamy Retweeted

Homanga Bharadhwaj@mangahomanga · Jun 26

Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation *without* requiring any paired human-robot data or online RL 🙂 1/n

246

146

29.0K

Gokul Swamy Retweeted

Kushal@kushalk_ · Jun 26

Teleoperation is slow, expensive, and difficult to scale. So how can we train our robots instead? Introducing X-Sim: a real-to-sim-to-real framework that trains image-based policies 1) learned entirely in simulation 2) using rewards from human videos. portal-cornell.github.io/X-Sim

110

8.0K

Gokul Swamy Retweeted

Eugene Vinitsky 🍒🦋@EugeneVinitsky · Jun 23

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

270

30.0K