Chenhao Zheng
@Michael3014018
Computer Vision PhD student @uwcse | Student Reseacher @allen_ai | ex Undergrad @UMichCSE and @sjtu1896
Having trouble dealing with the excessive token number when processing a video? Check out our paper that is accepted by ICCV 2025 with an average score of 5.5! We tokenize video with tokens grounded in trajectories of all objects rather than fix-sized patches. Trained with a…

🔥We are excited to present our work Synthetic Visual Genome (SVG) at #CVPR25 tomorrow! 🕸️ Dense scene graph with diverse relationship types. 🎯 Generate scene graphs with SAM segmentation masks! 🔗Project link: bit.ly/4e1uMDm 📍 Poster: #32689, Fri 2-4 PM 👇🧵
Calling all #CVPR2025 attendees! Join us at the SynData4CV Workshop at @CVPR (Jun 11 full day at Grand C2, starting at 9am) to learn more about recent advancements in synthetic data for CV! Explore more: syndata4cv.github.io
The 2nd Synthetic Data for Computer Vision workshop at @CVPR! We had a wonderful time last year, and we want to build on that success by fostering fresh insights into synthetic data for CV. Join us! We welcome submissions! Please consider submitting your work! (deadline: March…
🎉 Excited to introduce "The One RING: a Robotic Indoor Navigation Generalist" – our latest work on achieving cross-embodiment generalization in robot visual navigation! 🤖🌍 RING is a universal navigation policy trained entirely in simulation on diverse, random embodiments at…
The slide is bad, her response to an audience is even worse… “Maybe there is one, maybe they are common, who knows what. I hope it was an outlier."
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over…