Botao Yu
@BotaoYu24
PhD student @ OSU NLP Group @osunlp.
🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…
Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025
🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…
BREAKING: xAI announces Grok 4 "It can reason at a superhuman level!" Here is everything you need to know:
⬇️ Check out SDE-Harness, our general framework for evaluating LLMs/agents on scientific discovery. It features easy integration, broad LLM support, dynamic prompting, comprehensive logging, and customizable metrics, applicable for all domains and tasks.
🚀🔬 Introducing SDE-Harness: The Scientific Discovery Evaluation Framework A discovery-first, open-source toolkit built to accelerate LLM-driven scientific research and amplify discovery. Why SDE-Harness? Scientific discovery is an iterative process to search for hypotheses…
Holy moly, what a massive effort, proud to be part of it! 🥳 As agentic search continues to evolve and increasingly support our work and daily lives, Mind2Web 2 arrives as a timely, rigorous benchmark for evaluation and progress tracking. (Now get to work, agent builders! This…
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…
Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…
🚀 A fantastic lineup of work in #AI4Science #DrugDiscovery #AI4Chemistry!
While #AI becomes a driving force for scientific discovery (#AIforScience), it is time to summarize our work on #GenAI and #LLMs for #DrugDiscovery over the past 5 years: 📌 𝐋𝐋𝐌-𝐛𝐚𝐬𝐞𝐝 𝐀𝐠𝐞𝐧𝐭 𝐟𝐨𝐫 𝐃𝐫𝐮𝐠 𝐃𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲: 📍 "Liddia: Language-based…
📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)
It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems. I’m attending #CVPR2025 in Nashville. Happy to chat about it!
📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵
📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵
Systematic reviews (SRs) drive evidence-based medicine, but months-long workflows can’t keep pace with today’s literature flood. Fully autonomous solutions promise speed, but the magic often fizzles - these models still skip pivotal trials, hallucinate findings, and bury the…
Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secure is … hard. Is @AnthropicAI Claude 4 Opus more robust to indirect prompt injection than previous versions like Claude 3.7? Not really. Why hard?…
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…
🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:…
Super excited to get funded by @schmidtsciences to study computer-use agents (CUAs) under adversarial attacks. Many thanks to the student leads including @LiaoZeyi, Jaylen Jones, Linxi Jiang, and amazing co-PIs @ysu_nlp and @cszlin. As the capabilities of CUAs improve,…
Proud moment for @OhioStateCSE! Prof. @hhsun1 has been awarded funding from @SchmidtSciences' for AI Safety initiative — a first for Ohio State. Her work will help defend AI agents from adversarial attacks. engineering.osu.edu/news/2025/05/h…
⏳ Less than 1 day left to submit! 🔦 Speaker Spotlight Time! We’re thrilled to welcome Yu Su (@ysu_nlp), Distinguished Assistant Professor at The Ohio State University, as an invited speaker at the ICML 2025 Workshop on Computer Use Agents! His work bridges LLM agents, memory,…