Gabriel Stanovsky (@GabiStanovsky)

Pinned

G

Gabriel Stanovsky@GabiStanovsky · Jun 12

Check out @niveckhaus 's excellent work, developing a model capable of playing human players in asynchronous settings, deciding when to intervene or when to stay quiet 🤐

NNiv Eckhaus@niveckhaus · Jun 12

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

0

1

8

0

486

Gabriel Stanovsky Retweeted

D

Daria Lioubashevski@DariaLioub · Jul 16

Ever wondered how Transformers refine their top-k predictions over their layers? 📊 Is there an order to the madness? Come find out at my poster presentation tommorow at @icmlconf 📍East Exhibition Hall E-2512, 11:00-13:30

0

2

15

0

393

Gabriel Stanovsky Retweeted

I

Itay Itzhak@Itay_itzhak_ · Jul 15

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

3

24

74

21

3.0K

G

Gabriel Stanovsky@GabiStanovsky · Jun 16

🕊️ DOVE is a living benchmark! Just pushed major updates: 📊 Dataset expansion: Added ~5700 MMLU examples with Llama-70B - each tested across 100 different prompt variations = 570K new predictions! 📈 Website upgrades: New interactive plots throughout- slab-nlp.github.io/DOVE/

EEliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Mar 17

Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!

1

6

9

1

894

Gabriel Stanovsky Retweeted

N

Niv Eckhaus@niveckhaus · Jun 12

We built and released the #LLMafia Dataset 🕵️‍♂️ 🎲 21 games 💬 2558 messages 🤖 211 messages from the LLM agent 🤗 Available on HuggingFace: huggingface.co/datasets/nivec… In the image: a real sample from our dataset 🧵5/7

1

2

0

263

Gabriel Stanovsky Retweeted

N

Niv Eckhaus@niveckhaus · Jun 12

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

2

12

51

19

5.0K

Gabriel Stanovsky Retweeted

E

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · May 15

🎉 Our paper DOVE 🕊️ has been accepted to #ACL2025 Findings! DOVE 🕊️ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

3

23

62

10

4.0K

Gabriel Stanovsky Retweeted

N

Noam Dahan@Dahan_Noam · May 12

Over the jet lag but missing #NAACL2025 and the famous gazebo - best time for highlights! 1. @radamihalcea 's “long tail of the world” metaphor really stuck with me: most of us are from small, often-overlooked cultures. Many papers in special track try to bridge this gap

1

7

25

3

2.0K

Gabriel Stanovsky Retweeted

U

Uri Berger@uriberger88 · May 5

Had an awesome time presenting both my talk and poster @naaclmeeting! Will miss having beer at the Sister pub 🍻 🎤 arxiv.org/abs/2409.16646 📌 arxiv.org/abs/2406.13274

0

2

13

0

321

G

Gabriel Stanovsky@GabiStanovsky · May 1

Accepted at #icml2025🥳 Camera ready version (with newer models like Llama-3 and Qwen-Audio) coming soon!

DDaria Lioubashevski@DariaLioub · Oct 29

📢Paper release📢 What computation is the Transformer performing in the layers after the top-1 becomes fixed (a so called "saturation event")? We show that the next highest-ranked tokens also undergo saturation *in order* of their ranking. Preprint: arxiv.org/abs/2410.20210 1/4

2

1

11

0

610

G

Gabriel Stanovsky@GabiStanovsky · May 1

If you're in @naaclmeeting and interested in cross-cultural research (like everyone else here...) come see my talk today. Ruidoso room 17:00, see you there :)

UUri Berger@uriberger88 · Oct 8

Have you ever wondered if speakers of different languages focus on different entities when viewing the same image? Check our recent work to find out! arxiv.org/abs/2409.16646 w\ @PontiEdoardo

0

1

9

0

185

Gabriel Stanovsky Retweeted

H

HUJI NLP@nlphuji · May 1

We're at #NAACL2025! Presenting: 📍Cross-Lingual and Cross-Cultural Variation in Image Descriptions Thu May 1, 5:00 PM Ruidoso 📍The State and Fate of Summarization Datasets: A Survey Fri May 2, 12:00 PM Ruidoso @uriberger88 , @Shachar_Don, @Dahan_Noam

0

7

33

0

988

G

Gabriel Stanovsky@GabiStanovsky · Apr 8

Only three more days to submit your evaluation papers to our ACL workshop!

SSebastian Gehrmann@sebgehr · Mar 24

Are you recovering from your @COLM_conf abstract submission? Did you know that GEM has a non-archival track that allows you to submit a two-page abstract in parallel? Our workshop deadline is coming up, please consider submitting your evaluation paper!

0

1

10

0

808

Gabriel Stanovsky Retweeted

S

Sebastian Gehrmann@sebgehr · Mar 24

Are you recovering from your @COLM_conf abstract submission? Did you know that GEM has a non-archival track that allows you to submit a two-page abstract in parallel? Our workshop deadline is coming up, please consider submitting your evaluation paper!

2

6

13

3

3.0K

Gabriel Stanovsky Retweeted

E

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Mar 17

Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!

2

14

49

9

6.0K

Gabriel Stanovsky Retweeted

G

Gili Lior@GiliLior · Mar 13

"Summarize this text" out ❌ "Provide a 50-word summary, explaining it to a 5-year-old" in ✅ The way we use LLMs has changed—user instructions are now longer, more nuanced, and packed with constraints. Interested in how LLMs keep up? 🤔 Check out WildIFEval, our new benchmark!

1

19

60

5

3.0K

Gabriel Stanovsky Retweeted

S

Shahar Levy@ShaharLevy19 · Mar 11

Can RAG performance get * worse * with more relevant documents?📄 We put the number of retrieved documents in RAG to the test! 💥Preprint💥: arxiv.org/abs/2503.04388 1/3

2

11

38

3

1.0K

Gabriel Stanovsky Retweeted

U

Uri Berger@uriberger88 · Mar 6

In-context learning assumes access to annotated datasets but in new domains we often label data ourselves with a limited budget. Given raw samples, how should we select demonstration samples for labeling? Read our paper: arxiv.org/abs/2406.13274 w\ @GabiStanovsky @talbaumel

1

3

5

0

309

Gabriel Stanovsky Retweeted

J

Jungsoo Park@jungsoo___park · Mar 3

🚨 Just Out Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior? We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

1

11

39

15

4.0K

Gabriel Stanovsky Retweeted

A

Adi Simhi@AdiSimhi · Feb 19

🚨New arXiv preprint!🚨 LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯 In our latest work with @Itay_itzhak_, @FazlBarez, @GabiStanovsky, and @boknilev, we challenge assumptions about hallucination origin!

4

35

163

105

13.0K