Ananye Agarwal
@anag004
building robot brains @SkildAI | MLD PhD at CMU | Prev CS @ IITD.
As a founding researcher, I have seen @SkildAI grow exponentially. We changed 3 offices, grew 10x in human (and robot) numbers, and become a unicorn in less than a year. If you want to scale up robotics and work with a cracked team of engineers and scientists, come to @SkildAI.
Thrilled to announce @SkildAI! Over the past year, @gupta_abhinav_ and I have been working with our top-tier team to build an AI foundation model grounded in the physical world. Today, we’re taking Skild AI out of stealth with $300M in Series A funding: forbes.com/sites/rashishr…
Research arc: ⏪ 2 yrs ago, we introduced VRB: learning from hours of human videos to cut down teleop (Gibson🙏) ▶️ Today, we explore a wilder path: robots deployed with no teleop, no human demos, no affordances. Just raw video generation magic 🙏 Day 1 of faculty life done! 😉…
🚀 Introducing RIGVid: Robots Imitating Generated Videos! Robots can now perform complex tasks—pouring, wiping, mixing—just by imitating generated videos, purely zero-shot! No teleop. No OpenX/DROID/Ego4D. No videos of human demonstrations. Only AI generated video demos 🧵👇
Humans grasp objects with a purpose! Web2Grasp enables such functional grasping for dexterous robot hands via hand-object reconstruction from web images - without *any* robot teleop data collection 1/n
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
Big news! Our paper "Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of LLMs" has been accepted to TACL — a top-tier ACL-sponsored journal (Impact Factor > 9)! 🎉 📄 Paper: arxiv.org/abs/2408.14470 🔧 Code: github.com/Aradhye2002/se… 🧵Thread below 👇
Pure continuous-space reasoning isn’t practical. Reasoning requires decision-making, which is naturally enforced when hidden states are decoded into discrete tokens.
It is intuitively obvious that reasoning in continuous embedding space is dramatically more powerful than reasoning in discrete token space. This paper from @tydsh and team show that it is the case theoretically.
PPO is often frustrating to tune for many continuous control tasks since it keeps getting stuck in local minima. In our SAPG paper (sapg-rl.github.io), we showed how training multiple followers with PPO and combining their data can mitigate this issue. In EPO,…
(1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you with a better alternative - EPO.
Excited to share our work: Maximizing Confidence Alone Improves Reasoning Humans rely on confidence to learn when answer keys aren’t available (e.g taking an exam). Surprisingly, LLMs can also learn w/o ground-truth answers, simply by reinforcing high-confidence answers via RL!
Cool (and somewhat counterintuitive) finding from my brother - conciseness and correctness are correlated in llm reasoning! This neat fact can be used to design an efficient and more accurate test time search algorithm.
For the past couple of months we've been working on test-time scaling, and we've discovered a huge thing:
For the past couple of months we've been working on test-time scaling, and we've discovered a huge thing:
Very exciting Handy Moves workshop at ICRA 2025 this year! It's an honor to be hosting this morning session! Please join us in Room 302 😀 sites.google.com/view/dexterity…
Maybe real-world robot generalization doesn’t need massive teleop datasets? 🤔 In DexWild, we show that human demos 🙌 + a little robot data 🤖 = policies that generalize across scenes 🏞️, tasks 🛠️, and embodiments 🦾!
Training robots for the open world needs diverse data But collecting robot demos in the wild is hard! Presenting DexWild 🙌🏕️ Human data collection system that works in diverse environments, without robots 💪🦾 Human + Robot Cotraining pipeline that unlocks generalization 🧵👇
Training robots for the open world needs diverse data But collecting robot demos in the wild is hard! Presenting DexWild 🙌🏕️ Human data collection system that works in diverse environments, without robots 💪🦾 Human + Robot Cotraining pipeline that unlocks generalization 🧵👇
Exiciting to see (at 5:55) Nvidia adopting LEAP Hand in their sim2real efforts! Build your own at leaphand.com ! Lots more coming this summer, stay tuned :) @pathak2206 @anag004
The Physical Turing Test: your house is a complete mess after a Sunday hackathon. On Monday night, you come home to an immaculate living room and a candlelight dinner. And you couldn't tell whether a human or a machine had been there. Deceptively simple, insanely hard. It is the…
Great to see the nvidia using LEAP Hand and the sim2real pipeline we developed for it (5:55)! We trained a policy to in-hand rotate a cube using only proprioception v1.leaphand.com. work w/ @kenny__shaw @pathak2206
The Physical Turing Test: your house is a complete mess after a Sunday hackathon. On Monday night, you come home to an immaculate living room and a candlelight dinner. And you couldn't tell whether a human or a machine had been there. Deceptively simple, insanely hard. It is the…
Skild AI is on the @Forbes AI 50 list of the most promising privately-held AI companies in the world!! #ForbesAI50 Join us: skild.ai/career Full list: forbes.com/lists/ai50/
Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N
1/ Happy to share UniDisc - Unified Multimodal Discrete Diffusion – We train a 1.5 billion parameter transformer model from scratch on 250 million image/caption pairs using a **discrete diffusion objective**. Our model has all the benefits of diffusion models but now in…
RL is notoriously sample inefficient. How can we scale RL on tasks much slower to simulate than rigid body physics, such as soft bodies? In our #ICLR2025 spotlight, we introduce both a new first-order RL algorithm, SAPO, and differentiable simulation platform, Rewarped. 1/n
Low-cost teleop systems have democratized robot data collection, but they lack any force feedback, making it challenging to teleoperate contact-rich tasks. Many robot arms provide force information — a critical yet underutilized modality in robot learning. We introduce: 1. 🦾A…
Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker). We performed the largest-ever comparison of these algorithms. We find that they do not outperform generic policy gradient methods, such as PPO. 1/N