Keshigeyan Chandrasegaran
@keshigeyan
CS PhD student @Stanford. Research @StanfordAILab, @StanfordSVL & @LiquidAI_. Prev: research @sutdsg (Temasek Labs), undergrad @sutdsg.
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with @MichaelPoli6
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical…
Evo 2 update: new dependency versions (torch, transformer engine, flash attn) and a docker option mean it should be easy to setup without needing to compile locally. Happy ATGC-ing! github.com/ArcInstitute/e…
It's easy (and fun!) to get nerdsniped by complex architecture designs. But over the years, I've seen hybrid gated convolutions always come out on top in the right head-to-head comparisons. The team brings a new suite of StripedHyena-style decoder models, in the form of SLMs…
Liquid AI open-sources a new generation of edge LLMs! 🥳 I'm so happy to contribute to the open-source community with this release on @huggingface! LFM2 is a new architecture that combines best-in-class inference speed and quality into 350M, 700M, and 1.2B models.
we just released LFMs! go check them out. So proud of our team @LiquidAI_ 🙏🏻
Today we introduce Liquid Foundation Models (LFMs) to the world with the first series of our Language LFMs: A 1B, 3B, and a 40B model. (/n)
Today, we release the 2nd generation of our Liquid foundation models, LFM2. LFM2 set the bar for quality, speed, and memory efficiency in on-device AI. Built for edge devices like phones, laptops, AI PCs, cars, wearables, satellites, and robots, LFM2 delivers the fastest…
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…
Huge milestone from the team! A blazing-fast diffusion LLM built for chat, delivering real-time performance at commercial scale. If you liked Mercury Coder for code, you'll love this for conversation.
We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.
FlowMo, our paper on diffusion autoencoders for image tokenization, has been accepted to #ICCV2025! See you in Hawaii! 🏄♂️
Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! I’m excited to share FlowMo, with @kylehkhsu, @jcjohnss, @drfeifei, @jiajunwu_cs. A thread 🧵:
#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: kyleleey.github.io/WonderPlay/ 🧵1/7
🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️
✨The Mercury tech report from Inception Labs is now available on the Arxiv. It took us a bit of time to get this one out, but it’s a nice complement to the blog post with many more experiments. Stay tuned for more updates soon! arxiv.org/abs/2506.17298
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…
Join us tomorrow in SGM 124 for the SWOMO workshop at #RSS2025! We will have 6 amazing talks and a panel in the end to discuss structured world modeling for robotics! Latest schedule and information at swomo-rss.github.io
Excited to announce the “Structured World Models for Robotic Manipulation” workshop at #RSS2025 in LA! Website: swomo-rss.github.io Call for Papers (Deadline: May 23): swomo-rss.github.io/index.html#call Come join us with a stellar lineup of speakers to discuss the various important &…
This is a really cool paper by my student @keshigeyan and collaborators! 😍
1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with @MichaelPoli6
Chipmunks for everyone!
Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
The Inception paper arxiv.org/abs/1409.4842 was awarded the Longuet-Higgins prize (Test of time). The architecture represented a significant step forward in inference efficiency especially on CPU and variants of Inception networks were used in Google products for years.
Check out our work on long-form video understanding!
Want to process 1-hour long videos? Welcome to talk with us at ExHall D Poster #306, Fri Jun13 4-6pm @CVPR about temporal searching! T* can plug in any VLMs! Try with your own demos!
Congrats Chaitanya on winning the BEST PAPER AWARD 🥇 🏆 #CVPR2025 Check out details of our work: arxiv.org/abs/2504.12513
Our first poster is up! 🕐Come check it out right now until 13:00 “AdaVid: Adaptive Video-Language Pretraining” 🪧ExHall D Poster # 203 📝arxiv.org/abs/2504.12513
😍!
Take out your VR set and “touch” the Gaussians, while enjoying a chill stroll in different worlds.