Eran Hirsch @ACL2025 🇦🇹 (@hirscheran)

Pinned

E

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jun 3

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…

hirscheran's tweet image. 🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!

LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…

3

32

82

22

12.0K

E

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jul 27

Presenting LAQuer tomorrow (Monday) from 11:00 to 12:30 at poster session 1!

EEran Hirsch @ACL2025 🇦🇹@hirscheran · Jun 3

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…

0

3

13

1

869

Eran Hirsch @ACL2025 🇦🇹 Retweeted

Y

Yumo Xu@yumo_xu · Jul 25

Excited to share our #ACL2025NLP paper, "𝐂𝐢𝐭𝐞𝐄𝐯𝐚𝐥: 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞-𝐃𝐫𝐢𝐯𝐞𝐧 𝐂𝐢𝐭𝐚𝐭𝐢𝐨𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐒𝐨𝐮𝐫𝐜𝐞 𝐀𝐭𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧"! 📜 If you’re working on RAG, Deep Research and Trustworthy AI, this is for you. Why? Citation quality is…

2

8

43

20

4.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

G

Google Labs@GoogleLabs · Jul 24

We just discovered the 🔥 COOLEST 🔥 trick in Flow that we have to share: Instead of wordsmithing the perfect prompt, you can just... draw it. Take the image of your scene, doodle what you'd like on it (through any editing app), and then briefly describe what needs to happen…

109

399

3.0K

2.0K

364.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

V

Vishakh Padmakumar@vishakh_pk · Jul 25

Maybe don't use an LLM for _everything_? Last summer, I got to fiddle again with content diversity @AdobeResearch @Adobe and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

1

11

61

29

6.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

D

Denny Zhou@denny_zhou · Jul 24

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial…

26

348

2.0K

3.0K

179.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

N

Nitay Calderon @ACL🇦🇹@NitCal · Jul 25

Everyone uses LLMs to annotate data or evaluate models in their research. But how can we convince others (readers, collaborators, reviewers!!!) that LLMs are reliable? 🤖 Here’s a simple (and low-effort) solution: show the LLM is a *comparable alternative annotator* ✅

3

19

61

33

5.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

S

Shmulik Amar@pyshmulik · Jul 24

🚨 Introducing IGCS, accepted to #TACL! Instruction Guided Content Selection (IGCS) unifies many tasks such as extractive summarization, evidence retrieval and argument mining under one scheme for selecting extractive spans in given sources. arxiv.org/abs/2507.16922 @biunlp (1/n)

2

8

27

2

717

Eran Hirsch @ACL2025 🇦🇹 Retweeted

A

AI Safety Papers@safe_paper · Jul 21

Chain-of-Thought Is Not Explainability @FazlBarez, Tung-Yu Wu, Iván Arcuschin (@IvanArcus), Michael Lan, Vincent Wang (@vinnylarouge), Noah Siegel (@noahysiegel), Nicolas Collignon (@nccollignon), Clement Neo (@_clementneo), Isabelle Lee (@wordscompute), Alasdair Paren,…

4

20

122

86

7.0K

E

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jul 21

🎉 Our paper, GenerationPrograms, which proposes a modular framework for attributable text generation, has been accepted to @COLM_conf! GenerationPrograms produces a program that executes to text, providing an auditable trace of how the text was generated and major gains on…

DDavid Wan@meetdavidwan · Jun 18

Excited to share GenerationPrograms! 🚀 How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,…

0

23

37

6

3.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

A

Ai2@allen_ai · Jul 16

We’ve upgraded ScholarQA, our agent that helps researchers conduct literature reviews efficiently by providing detailed answers. Now, when ScholarQA cites a source, it won’t just tell you which paper it came from–you’ll see the exact quote, highlighted in the original PDF. 🧵

6

33

198

98

17.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

Y

Yuntian Deng@yuntiandeng · Jul 14

NeuralOS is trained to predict the next screen image conditioned on previous frames and user inputs (mouse, keyboard). We use a training dataset of Ubuntu XFCE desktop recordings, collected via automated random exploration and demonstrations by humans and AI agents. 3/5

1

2

7

0

718

Eran Hirsch @ACL2025 🇦🇹 Retweeted

V

Valentina Pyatkin@valentina__py · Jul 3

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

5

87

347

183

47.0K

E

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jul 3

This new benchmark created by @valentina__py should be the new default replacing IFEval. Some of the best frontier models get <50% and it comes with separate training prompts so people don’t effectively train on test. Wild gap from o3 to Gemini 2.5 pro of like 30 points.

AAi2@allen_ai · Jul 3

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

10

22

199

92

23.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

A

Ai2@allen_ai · Jul 3

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

3

52

316

150

46.0K

Eran Hirsch @ACL2025 🇦🇹 Retweeted

A

Ahmad Beirami@abeirami · Jul 3

In fact, a reviewer can and should become much more efficient with the help of LLMs in each step of the process. How? For example, they can prompt the LLM to dig into related literature, find relevant papers, and understand the positioning of the paper with respect to them.

2

1

3

0

356

Eran Hirsch @ACL2025 🇦🇹 Retweeted

I

Itay Nakash@itay__nakash · Jun 23

🚨New preprint! Are your policy-adherent agents really safe? We present CRAFT: a multi-agent, strategic, red-teaming system that shows current evaluations vastly UNDERESTIMATE how easily these agents break under realistic, strategic attacks. 🧵 1/n

1

7

21

3

925

Eran Hirsch @ACL2025 🇦🇹 Retweeted

j

johann@vaelev · Jun 19

Claude made me this satisfying visualization the other day where it clicked for me.

2

5

51

30

9.0K

E

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jun 18

In RAG applications, self-citation methods are prone to make attribution mistakes because there is no inductive bias for LLMs to track which source supports each statement. We propose GenerationPrograms: first generate a clear plan, then use that plan to guide generation. That…

DDavid Wan@meetdavidwan · Jun 18

Excited to share GenerationPrograms! 🚀 How do we get LLMs to cite their sources? GenerationPrograms is attributable by design, producing a program that executes text w/ a trace of how the text was generated! Gains of up to +39 Attribution F1 and eliminates uncited sentences,…

0

6

11

2

1.0K