Shijie Chen

@ShijieChen98

PhD student @osunlp

Ohio, USA

Joined April 2018

216Following

213Followers

Pinned

Shijie Chen@ShijieChen98 · Oct 4

Is generation always the best way to use LLMs? 🤔 At least not for re-ranking! Excited to share our latest work: Attention in LLMs yields efficient zero-shot re-rankers. Introducing In-Context Re-ranking (ICR) - an efficient zero-shot re-ranking method leveraging LLM’s…

ShijieChen98's tweet image. Is generation always the best way to use LLMs? 🤔

At least not for re-ranking!

Excited to share our latest work: Attention in LLMs yields efficient zero-shot re-rankers.

Introducing In-Context Re-ranking (ICR) - an efficient zero-shot re-ranking method leveraging LLM’s…

18.0K

Pinned

Shijie Chen Retweeted

Xiang Yue@xiangyue96 · Dec 9

✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇. I'm on the faculty job market this year. My work focuses on advancing the reasoning abilities of LLMs across modalities and contexts. Ping me for a chat☕

190

69.0K

Shijie Chen Retweeted

Yu Su@ysu_nlp · Jun 27

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

221

132

39.0K

Shijie Chen Retweeted

Yifei Li@YifeiLiPKU · Jun 12

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

9.0K

Shijie Chen Retweeted

Yu Su@ysu_nlp · Jun 11

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

269

153

22.0K

Shijie Chen Retweeted

Botao Yu@BotaoYu24 · Jun 6

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…

8.0K

Shijie Chen@ShijieChen98 · Jun 3

Checkout InsightAgent (ACL'25 main), our latest work on accelerating systematic reviews from taking months to just hours with interactive AI agents! While full automation is handy, human expertise is still a must in many high-stake domains. Different from the regular…

RRui Qiu@RuiQiu18 · Jun 3

Systematic reviews (SRs) drive evidence-based medicine, but months-long workflows can’t keep pace with today’s literature flood. Fully autonomous solutions promise speed, but the magic often fizzles - these models still skip pivotal trials, hallucinate findings, and bury the…

398

Shijie Chen Retweeted

Zeyi Liao@LiaoZeyi · May 30

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…

22.0K

Shijie Chen Retweeted

Boyuan Zheng@ICML@boyuan__zheng · Apr 10

🔧What if your web agent could abstract its experience into programmatic skills—and improve itself autonomously? 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠…

12.0K

Shijie Chen Retweeted

Boshi Wang@BoshiWang2 · Apr 9

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design…

130

870

895

133.0K

Shijie Chen@ShijieChen98 · Apr 1

🚀 Excited to co-organize the Workshop on Computer Use Agents (CUA) at #ICML2025 in Vancouver! This workshop takes a comprehensive look at computer use agents—covering learning algorithms, orchestration, interfaces, safety, benchmarking, applications, and more. We’re also…

CComputerUseAgents Workshop@workshopcua · Mar 31

🚀Announcing the Workshop on Computer Use Agents at #ICML2025 in July, Vancouver! Join us, to advance research on AI agents performing real-world computer tasks. 🤖Call for Papers & Demos: Deadline May 18, 2025 🎙️Exciting speaker lineup announced! ✍️Interested in…

3.0K

Shijie Chen Retweeted

Yu Su@ysu_nlp · Mar 25

🔥2025 is the year of agents, but are we there yet?🤔 🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported! Why were benchmark numbers inflated? -…

234

123

33.0K

Shijie Chen Retweeted

Bernal Jiménez@bernaaaljg · Feb 28

Introducing ✨HippoRAG 2 ✨ 📣 📣 “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models” HippoRAG 2 is a memory framework for LLMs that elevates our brain-inspired HippoRAG system to new levels of performance and robustness. 🔓 Unlocks Memory…

134

23.0K

Shijie Chen Retweeted

Sam Stevens@samstevens6860 · Feb 26

What's actually different between CLIP and DINOv2? CLIP knows what "Brazil" looks like: Rio's skyline, sidewalk patterns, and soccer jerseys. We mapped 24,576 visual features in vision models using sparse autoencoders, revealing surprising differences in what they understand.

289

207

32.0K

Shijie Chen@ShijieChen98 · Jan 29

🚀Our ScienceAgentBench is covered by @Nature News! With the help of @ShijieChen98 and @YifeiLiPKU, we sampled 20 tasks from ScienceAgentBench to conduct a head-to-head comparison of OpenAI o1 (2024-12-17) and DeepSeek R1. 🔹Performance: Given three attempts, R1 can solve 7 out…

nnature@Nature · Jan 29

DeepSeek's open AI model is giving scientists worldwide the opportunity to train custom reasoning models designed to solve problems in their disciplines. go.nature.com/42zO92D

13.0K

Shijie Chen@ShijieChen98 · Jan 24

🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven R&D tasks across diverse scientific disciplines. 🔬 👇 Resources and previous posts below:

ZZiru Chen@RonZiruChen · Oct 8

🚀 Can language agents automate data-driven scientific discovery? Not yet. But we're making strides. Introducing **ScienceAgentBench**: a new benchmark to rigorously evaluate language agents on 102 tasks from 44 peer-reviewed publications across 4 scientific disciplines. (1/10)

5.0K

Shijie Chen@ShijieChen98 · Jan 28

Thrilled to announce that our work, In-context Re-ranking, is accepted to #ICLR2025! TL;DR: By simply aggregating attention weights, we turn LLMs into powerful and efficient re-rankers generating a single token. More details below 👇:

SShijie Chen@ShijieChen98 · Oct 4

2.0K

Shijie Chen Retweeted

Ziru Chen@RonZiruChen · Jan 13

🚀ScienceAgentBench evaluation is now containerized! Inspired by SWE-Bench, we leverage Docker for task isolation, enabling multi-threaded execution and slashing evaluation time to under 30 minutes. Plus, evaluate your agents with just one bash command! Great work done by…

11.0K

Shijie Chen Retweeted

Boyu Gou@BoyuGouNLP · Dec 20

With recent advancements like Claude 3.5 Computer Use and Gemini 2.0, the field of GUI Agents is rapidly evolving. 🚀 Excited to introduce GUI Agent Paper List, your go-to repo for the latest in GUI Agent research! 🌟 ✨ Key Features: - 170+ Papers grouped by environments,…

14.0K

Shijie Chen Retweeted

Yu Gu@yugu_nlp · Nov 21

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and…

414

387

112.0K

Shijie Chen Retweeted

Botao Yu@BotaoYu24 · Nov 12

🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhanced language agent with 29 tools for tackling chemistry problems. We evaluated it on both specialized chemistry tasks (e.g., compound synthesis, compound…

5.0K