Ori Yoran

@OriYoran

NLP researcher / P.hD candidate (Tel-Aviv University)

Joined March 2021

585Following

635Followers

Pinned

Ori Yoran@OriYoran · Jul 22, 2024

Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io

174

125

43.0K

Ori Yoran Retweeted

Itay Itzhak@Itay_itzhak_ · Jul 15

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

3.0K

Ori Yoran Retweeted

Ido Cohen@IdoC0hen · Jul 6

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵

2.0K

Ori Yoran Retweeted

Neta Shaul@shaulneta · Jul 2

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

289

205

79.0K

Ori Yoran Retweeted

Mor Geva@megamor2 · Jun 13

✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas

233

183

56.0K

Ori Yoran Retweeted

Yijia Shao@EchoShao8899 · Jun 12

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

132

666

723

105.0K

Ori Yoran Retweeted

Ricky T. Q. Chen@RickyTQChen · Jun 11

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

516

343

39.0K

Ori Yoran Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jun 9

Corrector Sampling in Language Models "Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by…

275

184

18.0K

Ori Yoran Retweeted

Chaitanya Malaviya@cmalaviya11 · Jun 6

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

14.0K

Ori Yoran Retweeted

Yoav Gur Arieh@GurYoav · May 29

Can we precisely erase conceptual knowledge from LLM parameters? Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge. We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

6.0K

Ori Yoran Retweeted

AK@_akhaliq · May 28

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

370

211

29.0K

Ori Yoran@OriYoran · May 28

Our new benchmark is finally out! Lots of cool demo vids in this thread:

AAlex Zhang@a1zhang · May 28

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

5.0K

Ori Yoran Retweeted

Alex Zhang@a1zhang · May 28

540

232

119.0K

Ori Yoran Retweeted

Michael Hassid@MichaelHassid · May 27

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

104

6.0K

Ori Yoran Retweeted

Gallil Maimon@GallilMaimon · Apr 4

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

5.0K

Ori Yoran Retweeted

Jonathan Berant@JonathanBerant · Mar 20

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

9.0K

Ori Yoran Retweeted

Noam Razin@noamrazin · Mar 20

The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality? 📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers! 🧵

121

758

582

70.0K

Ori Yoran Retweeted

Pierre Chambon@PierreChambon6 · Mar 20

Does your LLM truly comprehend the complexity of the code it generates? 🥰 Introducing our new non-saturated (for at least the coming week? 😉) benchmark: ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity? Check out the details below !👇

119

28.0K