Steffi Chern
@steffichern
Incoming CS PhD @Penn | @NSF Graduate Fellow | undergrad @CarnegieMellon 🤠
🚀How can we effectively evaluate and prevent superintelligent LLMs from deceiving others? We introduce 🤝BeHonest, a pioneering benchmark specifically designed to assess the honesty in LLMs comprehensively. Paper 📄: [arxiv.org/abs/2406.13261] Code 👨🏻💻: [github.com/GAIR-NLP/BeHon…]…
![steffichern's tweet image. 🚀How can we effectively evaluate and prevent superintelligent LLMs from deceiving others?
We introduce 🤝BeHonest, a pioneering benchmark specifically designed to assess the honesty in LLMs comprehensively.
Paper 📄: [arxiv.org/abs/2406.13261]
Code 👨🏻💻: [github.com/GAIR-NLP/BeHon…]…](https://pbs.twimg.com/media/GSIlT9haUAAqLJA.jpg)
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:…
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)
Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…
New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.
What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.
To further boost the "think with images" community, we've systematically summarized the latest research in our new repository: github.com/zhaochen0110/A… 🧠🖼️Let's make LVLMs see & think! A comprehensive survey paper will be released soon! Stay tuned.
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
"How, exactly, could AI take over by 2027?" Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen
One RL to See Them All Visual Triple Unified Reinforcement Learning
🔥 Excited to share our work "Efficient Agent Training for Computer Use" Q: Do computer use agents need massive data or complex RL to excel? A: No, with just 312 high-quality trajectories, Qwen2.5-VL can outperform Claude 3.7, setting a new SOTA for Windows computer use. 1/6
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.…
Outstanding Papers Safety Alignment Should be Made More Than Just a Few Tokens Deep. Xiangyu Qi, et al. Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. AlphaEdit: Null-Space Constrained Model Editing for Language Models. Junfeng Fang, et al.
🔥 Happy to share our paper on test-time scaling (TTS)! 🚀 We take the position that generative AI has entered Act II, that is cognition engineering driven by TTS. 🛠️ We provide many valuable resources to help community utilize TTS to develop the cognitive ability of models.
I finally wrote another blogpost: ysymyth.github.io/The-Second-Hal… AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.
🔍Exciting to introduce DeepResearcher, the first end-to-end trained #DeepResearch model with #RL scaling in real-world environments! ✨No more controlled simulations - this is RL in the wild with authentic search interactions! Paper: arxiv.org/pdf/2504.03160 1/7
🥁🥁 Happy to share our latest efforts on math pre-training data, the MegaMath dataset! This is a 9-month project starting from 2024’s summer, and we finally deliver: the largest math pre-training data to date containing 💥370B 💥tokens of web, code, and synthetic data!
We've written a paper (145 pages!!) about our approach for AGI safety at @GoogleDeepMind. It's not just scalable oversight and interp -- so much more needs to come together. deepmind.google/discover/blog/…
Check out our newest, fully open-source RL framework for VLMs—built from scratch, reproducible, and tested on real benchmarks!
🔥 New paper drop! 🔥 🔍 In the fast-paced world of RL scaling, where leaderboard performance and rapid results take priority, the value of transparent, step-by-step exploration is often overlooked. Our latest work, MAYE, addresses this gap by introducing: 1️⃣ A from-scratch RL…
#LIMR Less is More for RL Scaling! Less is More for RL Scaling! Less is More for RL Scaling! - What makes a good example for RL scaling? We demonstrate that a strategically selected subset of just 1,389 samples can outperform the full 8,523-sample dataset. - How to make a…
🔥 Excited to share our work "LIMR: Less is More for RL Scaling" Q: What determines the effectiveness of RL training data ? A: Alignment with model's learning journey 1,389 strategic samples ≥ 8,523 full dataset 🤯 📄: github.com/GAIR-NLP/LIMR/… 💻: github.com/GAIR-NLP/LIMR 1/6
Introducing CodeI/O (codei-o.github.io), a systematic way to condense diverse reasoning patterns via code input-output prediction to build massive training data for more reasoning tasks beyond commonly focused math problem-solving and code generation, which usually suffer…