Yi-Ting Chen
@chen_yiting_TW
Associate Professor of CS @ National Yang Ming Chiao Tung University working on human-centered intelligent systems
📢 The first X-Sense Workshop: Ego-Exo Sensing for Smart Mobility at #ICCV2025! 🎤 We’re honored to host an outstanding speaker lineup, featuring Manmohan Chandraker, @BharathHarihar3, @wucathy, Holger Caesar, @zhoubolei, @Boyiliee, Katie Luo x-sense-ego-exo.github.io
We have developed a new tactile sensor, called e-Flesh, with a simple working principle: measure deformations in 3D printable microstructures. Now all you need to make tactile sensors is a 3D printer, magnets, and magnetometers! 🧵
Finally, a good modern book on causality for ML: causalai-book.net by @eliasbareinboim. This looks like a worthy successor to the ground breaking book by @yudapearl which I read in grad school. (h/t @JoshuaSafyan for the ref).
I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour &…
OpenArm has been released - A fully open-source bimanual robot arm built for physical AI. Some stats: – 7 DoF – 633mm reach – 6kg payload/arm All in BOM cost of $6.5k. Thanks for sharing - @enactic_ai @hiro_yams
How can we leverage diverse human videos to improve robot manipulation? Excited to introduce EgoVLA — a Vision-Language-Action model trained on egocentric human videos by explicitly modeling wrist & hand motion. We build a shared action space between humans and robots, enabling…
I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi
I believe all professors in the field of AI and machine learning at top universities need to face a soul-searching question: What can you still teach your top (graduate) students about AI that they cannot learn by themselves or elsewhere? It had bothered me for quite some years…
Don’t just predict the mean of your clean data given your noisy data, predict the full distribution.
Distributional diffusion models with scoring rules at #icml25 Fewer, larger denoising steps using distributional losses! Wednesday 11am poster E-1910 arxiv.org/pdf/2502.02483 @agalashov @ValentinDeBort1 Guntupalli @zhouguangyao @sirbayes @ArnaudDoucet1
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Top AI models based on use-case post Grok-4 Coding - Sonnet-4 General purpose - GPT-4.1 Planning and reasoning - Grok-4 Video - Veo Fast, SeaDance Image - SeaDance, GPT-image Cheap all-purpose - Flash Real-time search - Grok-4
What if we use imitation learning in LLMs? "Reinforcement Learning with Action Chunking" Q-chunking makes RL agents pick fixed action chunks, giving clear multi-step updates and faster exploration that outperforms earlier offline-to-online methods on long & sparse-reward tasks
I've been a bit quiet on X recently. The past year has been a transformational experience. Grok-4 and Kimi K2 are awesome, but the world of robotics is a wondrous wild west. It feels like NLP in 2018 when GPT-1 was published, along with BERT and a thousand other flowers that…
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
Our new work on adaptive image tokenization: Image —> T tokens * variable T, based on image complexity * single forward pass both infers T and tokenizes to T tokens * approximates minimum description length encoding of the image
Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Contributions: • LangSplatV2 achieves real-time performance with 476.2 FPS for high-dimensional feature splatting and 384.6 FPS for 3D open-vocabulary text querying. • Delivers a 42× speedup and 47×…
What are robot world models? If you follow the robotics space, you'll have almost certainly heard the term "world model." In the end, these mean using generative ai to build data-driven simulators or planners in order to build general-purpose robots. In this post I write about…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇
Meet ProVox: a proactive robot teammate that gets you 🤖❤️🔥 ProVox models your goals and expectations before a task starts — enabling personalized, proactive help for smoother, more natural collaboration. All powered by LLM commonsense. Recently accepted at @ieeeras R-AL! 🧵1/7