Boshi Wang

@BoshiWang2

Fourth-year Ph.D. @OhioState. Prev intern @MSFTResearch

The Ohio State University

Joined May 2021

505Following

2KFollowers

Pinned

Boshi Wang@BoshiWang2 · Apr 9

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design…

BoshiWang2's tweet image. LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -&gt; "B is A"). But why?

Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design…

130

869

895

133.0K

Boshi Wang Retweeted

Huan Sun (OSU)@hhsun1 · Jul 15

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…

14.0K

Boshi Wang Retweeted

Yu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

221

132

39.0K

Boshi Wang Retweeted

Yifei Li@YifeiLiPKU · Jun 12

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

9.0K

Boshi Wang Retweeted

Yu Su@ysu_nlp · Jun 11

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

269

153

22.0K

Boshi Wang Retweeted

Botao Yu@BotaoYu24 · Jun 6

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…

8.0K

Boshi Wang Retweeted

Rui Qiu@RuiQiu18 · Jun 3

Systematic reviews (SRs) drive evidence-based medicine, but months-long workflows can’t keep pace with today’s literature flood. Fully autonomous solutions promise speed, but the magic often fizzles - these models still skip pivotal trials, hallucinate findings, and bury the…

4.0K

Boshi Wang Retweeted

Zeyi Liao@LiaoZeyi · May 30

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…

22.0K

Boshi Wang Retweeted

Vardaan Pahuja@vardaanpahuja · May 29

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:…

5.0K

Boshi Wang Retweeted

Huan Sun (OSU)@hhsun1 · May 1

I will miss #NAACL2025 unfortunately, but please check out our work on chemistry agents, "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" today (May 1) during 2:00-3:30pm (local time) at Hall 3, Poster Session 5! Some updates: We have renamed…

6.0K

Boshi Wang@BoshiWang2 · Apr 25

🚨We just released the data generation code for RoboSpatial! 💾 github.com/NVlabs/RoboSpa… 📢 And yes, RoboSpatial is a #CVPR2025 Oral 🏆🔥

CChan Hee (Luke) Song@luke_ch_song · Mar 31

🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025

5.0K

Boshi Wang Retweeted

Yu Gu@yugu_nlp · Apr 22

“What's the role of NLP/LLM researchers in agent research?” “Natural language is merely a tool for communication.” … These doubts and criticisms have circulated widely over the past two years. In my PhD dissertation, I want to provide a perspective that addresses these doubts…

10.0K

Boshi Wang Retweeted

Huan Sun (OSU)@hhsun1 · Apr 16

It's a great honor to give a keynote at the @Molecule_Maker symposium at UIUC! Many thanks to Prof. @hengjinlp and Prof. Jiawei Han for invitation. The symposium’s theme this year is “AI scientist? What would it take?”, which I hold close to heart and made a talk titled “Language…

11.0K

Boshi Wang Retweeted

Boyuan Zheng@ICML@boyuan__zheng · Apr 10

🔧What if your web agent could abstract its experience into programmatic skills—and improve itself autonomously? 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠…

12.0K

Boshi Wang@BoshiWang2 · Apr 10

🚀Big WebDreamer update! We train 💭Dreamer-7B, a small but strong world model for real-world web planning. 💥Beats Qwen2-72B ⚖️Matches #GPT-4o Trained on 3M synthetic examples — and yes, all data + models are open-sourced.

YYu Gu@yugu_nlp · Nov 21

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and…

15.0K