Pavel Kopanev
@kopanevp
3D Computer Vision | Embodied Reasoning | Robotics | Foundational Models and beyond
Chelsea Finn (@chelseabfinn) on building general-purpose robotics, and bringing intelligence into the physical world. At AI Startup School in San Francisco. 00:00 - General Purpose Robots 00:11 - Challenges in Robotics Applications 00:57 - Physical Intelligence: A New Approach…
We’re organizing the RoboArena Challenge at CoRL this year! Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖 Sign up, even if you don’t have a robot! More details in 🧵👇
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
🚀 We release SpatialTrackerV2: the first feedforward model for dynamic 3D reconstruction and 3D point tracking — all at once! Reconstruct dynamic scenes and predict pixel-wise 3D motion in seconds. 🔗 Webpage: spatialtracker.github.io 🔍 Online Demo: huggingface.co/spaces/Yuxihen…
🚀Introducing Hunyuan3D-PolyGen, our newly upgraded and industry-first art-grade 3D generative model. It brings effortless intelligent retopology, making AI-generated models ready for professional art pipelines. ✅ Superior Mesh Topology: Our self-developed mesh autoregressive…
🎦 Amazing YouTube channel with videos explaining different concepts and new ideas in CV, RL, Gen AI, and more: youtube.com/@jbhuang0604/f… Author: @jbhuang0604
🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025! We have an excellent group of speakers and panelists, and are inviting you to submit your papers with a July 13 deadline. Website: robot-world-modeling.github.io
🚨Do frontier VLMs (o3, Gemini 2.5, Claude 3.5, Qwen…) actually learn an internal world model🌍? Surprisingly, the answer appears to be a hard NO—as revealed by our WM Atomic Benchmark⚛️. Even o3 struggles with the most basic, atomic-level questions: ❌Confuse triangles📐 with…
Some interesting Gemini CLI use cases and tutorials 🧵⬇️
Can we teach dexterous robot hands manipulation without human demos or hand-crafted rewards? Our key insight: Use Vision-Language Models (VLMs) to scaffold coarse motion plans, then train an RL agent to execute them with 3D keypoints as the interface. 1/7
Small Language Models are the Future of Agentic AI Lots to gain from building agentic systems with small language models. Capabilities are increasing rapidly! AI devs should be exploring SLMs. Here are my notes:
Camera rigs are now supported by COLMAP🔥. It can also work with unknown rig sensor poses by computing average relative poses via partial reconstruction of the scene (obtained by modeling each camera as its own rig)
We released COLMAP v3.12, which adds long-awaited end-to-end support for multi-camera rigs and 360° panoramas 👀 COLMAP just got better at handling your robotics, AR/VR, or 360 data - try it and let us know! github.com/colmap/colmap/… Kudos to Johannes & team for this great work 🚀
What a crazy week in AI 🤯 - Google Gemini CLI - HeyGen new Agent - Higgsfield Soul Model - DeepMind AlphaGenome - Anthropic Upgrade Artifacts - ElevenLabs 11a Voice Assistant - Flux.1 Kontext Dev Open-Sources - Google’s On-Device AI Gemma 3n Here’s EVERYTHING you need to know:
Say hello to the @geminicli, a local CLI to help you build and maintain software with 1,000 free Gemini 2.5 Pro requests per day : )
New paper on RL with a diffusion/flow policy! The idea is really a one-liner: train a new policy that outputs noises (z) as actions. Check out Andrew's thread below for more details! I'm leaving additional remarks on algorithms in this thread ↓
Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-shot performance isn’t enough. To address this challenge, we introduce DSRL: Diffusion Steering via Reinforcement Learning. (1/n) diffusion-steering.github.io
1X World Model: A bridge between the world of atoms and the world of bits Check out our video to see how we at @1x_tech are solving a missing link in general-purpose robotics: reliable model evaluation. More details below….
1X World Model Scaling Evaluation for Robots