Ben Cohen-Wang

@bcohenwang

Machine learning PhD student at MIT advised by Aleksander Madry

Joined September 2022

229Following

171Followers

Ben Cohen-Wang@bcohenwang · Apr 29

Popular reasoning benchmarks just reward correct answers (they don't penalize guessing). This incentivizes models that guess when they're not sure which (beyond hurting usability) seems like it would encourage hallucinations more broadly. Is this why o3 etc. hallucinate a lot?

1.0K

Ben Cohen-Wang@bcohenwang · Feb 14

Increasingly, LLMs cite sources for claims they make, but are the sources they cite actually what they are using? In work led by @YungSungChuang, we design a reward to quantify this, and use this reward to (automatically) improve citation quality! 🧵

YYung-Sung Chuang@YungSungChuang · Feb 14

(1/5)🚨LLMs can now self-improve to generate better citations✅ 📝We design automatic rewards to assess citation quality 🤖Enable BoN/SimPO w/o external supervision 📈Perform close to “Claude Citations” API w/ only 8B model 📄arxiv.org/abs/2502.09604 🧑‍💻github.com/voidism/SelfCi…

878

Ben Cohen-Wang@bcohenwang · May 6, 2024

We introduce ContextCite, a tool that can help us understand when and how an LLM uses in-context information! w/ @harshays_, @kris_georgiev1, @aleks_madry Check out our demo: huggingface.co/spaces/context… Thread ⤵️

AAleksander Madry@aleks_madry · May 6, 2024

How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context: gradientscience.org/contextcite w/ @bcohenwang, @harshays_, @kris_georgiev1

8.0K

Ben Cohen-Wang Retweeted

Aleksander Madry@aleks_madry · Mar 4, 2024

Models often fail under distribution shifts—can pre-training on a large and diverse dataset and then fine-tuning on a task-specific dataset help? W/ @bcohenwang, @josh_vendrow we show that this depends on the specific failure mode. In particular, pre-training can help with…

261

179

48.0K

Ben Cohen-Wang Retweeted

Aleksander Madry@aleks_madry · Feb 16, 2023

Will your model identify a polar bear on the moon? How would you know? Dataset Interfaces let you generate images from your dataset under whatever distribution shift you desire! arxiv.org/abs/2302.07865 gradientscience.org/dataset-interf… W/ @josh_vendrow @saachi_jain_ @logan_engstrom

133

29.0K

Ben Cohen-Wang Retweeted

Aleksander Madry@aleks_madry · Feb 14, 2023

Our paper on immunizing images to diffusion model-powered malicious manipulation is out arxiv.org/abs/2302.06588! This approach, combined with policy incentives, aims to raise the cost of such unauthorized image editing. w/ @hadisalmanX @Alaa_Khaddaj @gpoleclerc @andrew_ilyas

7.0K