Chuanyang Jin

@chuanyang_jin

PhD @JohnsHopkins | Intern @AIatMeta FAIR ⏰ Past: @MITCoCoSci & @MIT_CSAIL & @nyuniversity

Joined July 2022

379Following

451Followers

Pinned

Chuanyang Jin@chuanyang_jin · Aug 14

I am so flattered that our paper “MMToM-QA: Multimodal Theory of Mind Question Answering” won the Outstanding Paper Award at #ACL2024 @aclmeeting Huge thanks to all my amazing collaborators!!

CChuanyang Jin@chuanyang_jin · Jan 19, 2024

Can machines understand people’s minds from multimodal inputs? We introduce a comprehensive benchmark: “MMToM-QA: Multimodal Theory of Mind Question Answering” 📜 arxiv.org/abs/2401.08743

22.0K

Chuanyang Jin@chuanyang_jin · Jul 14

Heading to ICML to present our work Rejecting Instruction Preference (RIP) for better data curation and synthesis on Wed 07/16 (4:30pm - 7:00pm)! Excited to connect with folks interested in synthetic data, reasoning, RL and anything in general@FAIR. #ICML2025

JJason Weston@jaseweston · Jan 31

💀 Introducing RIP: Rejecting Instruction Preferences💀 A method to *curate* high quality data, or *create* high quality synthetic data. Large performance gains across benchmarks (AlpacaEval2, Arena-Hard, WildBench). Paper 📄: arxiv.org/abs/2501.18578

449

Chuanyang Jin@chuanyang_jin · Jul 1

🔍 WM‑ABench: a new benchmark for World Models WM‑ABench reveals that current VLMs lack a disentangled understanding of physical concepts and foundational knowledge for next-state prediction, and provides a fine-grained checklist to help close that gap.

QQiyue Gao@QiyueGao123 · Jul 1

🤔 Have @OpenAI o3, Gemini 2.5, Claude 3.7 formed an internal world model to understand the physical world, or just align pixels with words? We introduce WM-ABench, the first systematic evaluation of VLMs as world models. Using a cognitively-inspired framework, we test 15 SOTA…

1.0K

Chuanyang Jin Retweeted

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

267

179

62.0K

Chuanyang Jin Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jun 30

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation "we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660…

295

171

22.0K

Chuanyang Jin@chuanyang_jin · Jun 20

Welcome to join us tomorrow! 🗓️ June 21 | 8:50 AM – 12:30 PM PT 📍 USC (OHE 132) & Zoom (wse.zoom.us/j/95095685281)

TTianmin Shu@tianminshu · Jun 20

The #RSS2025 Workshop on Continual Robot Learning from Humans is happening on June 21. We have an amazing lineup of speakers discussing how we can enable robots to acquire new skills and knowledge from humans continuously. Join us in person and on Zoom (info on our website)!

1.0K

Chuanyang Jin@chuanyang_jin · Jun 20

Existing robot-manipulation benchmarks stop at object-level tasks, missing the part-level semantics essential for fine-grained control. Very excited to see PartInstruct, which finally fills this gap with a large-scale dataset for training and evaluating precise, long-horizon,…

YYifan Yin@yifanyin_11 · Jun 19

🚀New robot manipulation benchmark How to teach robots to reason about and interact with relevant object parts for a given fine-grained manipulation task? To address this challenge, our #RSS2025paper introduces PartInstruct, the first large-scale benchmark for fine-grained…

1.0K

Chuanyang Jin Retweeted

Tianmin Shu@tianminshu · Jun 14

🚀 Excited to introduce SimWorld: an embodied simulator for infinite photorealistic world generation 🏙️ populated with diverse agents 🤖 If you are at #CVPR2025, come check out the live demo 👇 Jun 14, 12:00-1:00 pm at JHU booth, ExHall B Jun 15, 10:30 am-12:30 pm, #7, ExHall B

208

27.0K

Chuanyang Jin Retweeted

Jason Weston@jaseweston · May 19

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 - 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the…

115

38.0K

Chuanyang Jin@chuanyang_jin · Apr 20

Check out this exciting workshop on continual learning from humans at RSS 2025 in LA! I am happy to be speaking and will share our works on observational learning through visual imitation of humans.

CChuanyang Jin@chuanyang_jin · Apr 20

Excited to announce the 1st Workshop on Continual Robot Learning from Humans @ #RSS2025 in LA! We're bringing together interdisciplinary researchers to explore how robots can continuously learn through human interactions! Full details: …-robot-learning-from-humans.github.io @RoboticsSciSys

3.0K

Chuanyang Jin@chuanyang_jin · Apr 19

Human-AI cooperation is an important problem, but many existing papers focus on training agents in the same 5 fixed Overcooked layouts, and use population-based training (PBT) to try to cover the diversity of human partner strategies. Diving into this problem, we find that…

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

10.0K

Chuanyang Jin@chuanyang_jin · Feb 27

📊Summary of updates on the MMToM-QA leaderboard: chuanyangjin.com/mmtom-qa-leade… - Recent LLMs with inference-time scaling (e.g., o3-mini) have significantly improved ToM performance but still fall short of human levels. Notably, they excel in belief questions but score below random on…

CChuanyang Jin@chuanyang_jin · Jan 19, 2024

Can machines understand people’s minds from multimodal inputs? We introduce a comprehensive benchmark: “MMToM-QA: Multimodal Theory of Mind Question Answering” 📜 arxiv.org/abs/2401.08743

1.0K

Chuanyang Jin@chuanyang_jin · Feb 26

Check out our latest work on machine Theory of Mind: #AutoToM ! We propose an approach that (1) combines the open-endedness of LLMs with robustness of Bayesian models; (2) leverages the uncertainties to refine the model, achieving better performance while maintaining low compute.

CChuanyang Jin@chuanyang_jin · Feb 26

How to achieve human-level open-ended machine Theory of Mind? Introducing #AutoToM: a fully automated and open-ended ToM reasoning method combining the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving SOTA results across five benchmarks. 🧵[1/n]

880

Chuanyang Jin@chuanyang_jin · Feb 26

Very excited to introduce AutoToM, our latest effort toward open-ended machine Theory of Mind. Given any context and ToM question, AutoToM automatically formulates a minimally sufficient probabilistic model to produce confident inference of any target mental variable.

CChuanyang Jin@chuanyang_jin · Feb 26

4.0K

Chuanyang Jin Retweeted

Zhijiang Guo@ZhijiangG · Feb 26

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…

158

12.0K