Chan Hee (Luke) Song (@luke_ch_song)

Pinned

C

Chan Hee (Luke) Song@luke_ch_song · Mar 31

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025

6

71

402

291

43.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jul 16

Reach out to Boyuan!

BBoyuan Zheng@ICML@boyuan__zheng · Jul 16

Attending #ICML2025 🇨🇦 this week! I’ll be co-organizing the Computer Use Agent Workshop @workshopcua on July 19th! Happy to chat about anything related to language agents — especially world modeling, scaling RL for agents, and multi-turn RL. Excited to meet old friends and…

0

1

0

260

C

Chan Hee (Luke) Song@luke_ch_song · Jul 15

Huan and I are looking for a postdoc to join us on agent research (broadly defined: planning, reasoning, safety, memory, continual learning, etc.). If you have a strong record in this space, drop us an email with CV! Retweet appreciated.

HHuan Sun (OSU)@hhsun1 · Jul 15

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…

0

15

49

9

9.0K

Chan Hee (Luke) Song Retweeted

H

Huan Sun (OSU)@hhsun1 · Jul 15

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…

0

27

61

6

14.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jul 8

Online-Mind2Web is accepted by #COLM2025 ! If you want a realistic, challenging, and easy-to-use testbed for your web agents, use our benchmark and LLM judge!

YYu Su@ysu_nlp · Mar 25

🔥2025 is the year of agents, but are we there yet?🤔 🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported! Why were benchmark numbers inflated? -…

0

3

22

4

967

C

Chan Hee (Luke) Song@luke_ch_song · Jul 4

🧐Curious how far Claude Research can go in freeing you from tedious daily tasks? 🚀Check out our new results on Mind2Web 2! 💡 Looking forward to seeing even better agentic search systems! 🙌 Join the effort and test your system on Mind2Web 2 today!

YYu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

2

4

21

4

2.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 27

New Deep Research benchmark and I’m quite proud of it!

YYu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

0

12

1

341

C

Chan Hee (Luke) Song@luke_ch_song · Jun 20

🚀 GR00T Dreams code is live! NVIDIA GEAR Lab's open-source solution for robotics data via video world models. Fine-tune on any robot, generate 'dreams', extract actions with IDM, and train visuomotor policies with LeRobot datasets (GR00T N1.5, SmolVLA). github.com/NVIDIA/GR00T-D…

JJoel Jang@jang_yoel · May 20

Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵

6

43

148

78

26.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 20

Exciting to see the VIRDO framework extend to both (1) in-hand pose + extrinsic contact estimation and (2) high-res tactile sensing — very interesting direction!

NNima Fazeli@NimaFazeli7 · Jun 20

👀🤚 Robots that see and feel at once! ViTaSCOPE fuses point-cloud vision with high-res tactile shear to nail in-hand 6-DoF pose plus contact patches—trained entirely in sim, zero-shot on hardware. Dive into the demos 👇 jayjunlee.github.io/vitascope/ #RSS2025 #robotics #tactileSensing

1

2

9

3

1.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 14

Are you at #CVPR2025? RoboSpatial Oral is today! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2

CChan Hee (Luke) Song@luke_ch_song · Mar 31

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025

0

6

18

0

2.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 12

Come and say 👋 tomorrow (06/13) for our oral (1pm, Karl Dean Ballroom) and poster sessions (4pm, ExHall D, #81)! #CVPR2025 @CVPR @CVPRConf @NVIDIAAIDev @NVIDIARobotics #NVIDIA

BBowen Wen@bowenwen_me · Mar 13

📢Time to upgrade your depth camera! Introducing **FoundationStereo**, a foundation model for stereo depth estimation in zero-shot (accepted to CVPR 2025 with full scores) [1/n] Code: github.com/NVlabs/Foundat… Website: nvlabs.github.io/FoundationSter… Paper: arxiv.org/abs/2501.09898

0

2

19

2

1.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 12

📢Excited to announce the first project of my PhD! Through our work we address the training data scarcity to develop AI co-scientist models via AutoSDT, a fully automatic pipeline that collects high quality scientific coding tasks at scale! Read more in the full post here 👇

YYifei Li@YifeiLiPKU · Jun 12

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

1

6

18

1

3.0K

Chan Hee (Luke) Song Retweeted

Y

Yu Su@ysu_nlp · Jun 11

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

5

56

269

153

22.0K

C

Chan Hee (Luke) Song@luke_ch_song · Jun 11

Heading to #CVPR2025 to present our Oral paper with @NVIDIARobotics! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2 I’ll also be at the 3D-VLA/VLM and EVAL-FoMo 2 workshops presenting the same work. Come say hi!

CChan Hee (Luke) Song@luke_ch_song · Mar 31

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025

3

5

29

6

1.0K

Chan Hee (Luke) Song Retweeted

B

Botao Yu@BotaoYu24 · Jun 6

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…

3

30

68

20

8.0K

Chan Hee (Luke) Song Retweeted

V

Vardaan Pahuja@vardaanpahuja · May 29

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:…

5

24

54

15

5.0K

C

Chan Hee (Luke) Song@luke_ch_song · May 30

Controlled sandbox environment for both OS and Web to test your agents on adversarial attacks!

ZZeyi Liao@LiaoZeyi · May 30

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…

0

3

0

258

Chan Hee (Luke) Song Retweeted

Y

Yu Su@ysu_nlp · May 23

Glad to get the 'little stamp' on my appointment letter one year ahead of the clock 🥰 It came just in time amid the peak of the AI hype week. With a bit more job security, now it's time to think about the next chapter of my career. How can one continue to make meaningful…

33

12

231

6

17.0K

Chan Hee (Luke) Song Retweeted

K

Kai Zhang@DrogoKhal4 · May 22

Tired of editing methods that require training, handcrafted subjects, or external memory? 🚀 #UltraEdit — Training-, subject-, and memory-free, for Lifelong Model Editing Compare to the prior best ✅New SOTA on 4 datasets and 6 models 🏎️7× faster – 20K samples within 5 mins on a…

1

10

35

7

4.0K

C

Chan Hee (Luke) Song@luke_ch_song · May 20

Now I believe in diffusion LLMs. Wondering if diffusion will lead to a more native multimodal input/output?

GGoogle DeepMind@GoogleDeepMind · May 20

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

0

247

Chan Hee (Luke) Song Retweeted

C

ComputerUseAgents Workshop@workshopcua · May 20

⏳ Less than 1 day left to submit! 🔦 Speaker Spotlight Time! We’re thrilled to welcome Yu Su (@ysu_nlp), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents! His work bridges LLM agents, memory,…

1

9

26

5

3.0K