Eliya Habba @ ACL 2025 🇦🇹

@EliyaHabba

PhD student at @HebrewU #NLP

Joined January 2024

128Following

61Followers

Pinned

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Mar 17

Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!

6.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Itay Itzhak@Itay_itzhak_ · Jul 15

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

3.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Leshem (Legend) Choshen 🤖🤗 @ACL@LChoshen · Jun 12

🚀 Technical practitioners & grads — join to build an LLM evaluation hub! Infra Goals: 🔧 Share evaluation outputs & params 📊 Query results across experiments Perfect for 🧰 hands-on folks ready to build tools the whole community can use Join the EvalEval Coalition here 👇

6.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Noy Sternlicht@NoySternlicht · May 29

🚨 New paper! We present CHIMERA — a KB of 28K+ scientific idea recombinations 💡 It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web 👇 Findings & data:

4.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Michael Hassid@MichaelHassid · May 27

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

104

6.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Oren Sultan@oren_sultan · May 25

🚀 I'm excited to share that our latest research titled: “Toward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verification” is now available on ArXiv 📄 arxiv.org/pdf/2505.14479 w/ @StrnYtn @HyadataLab

3.0K

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · May 15

🎉 Our paper DOVE 🕊️ has been accepted to #ACL2025 Findings! DOVE 🕊️ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

4.0K

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Gili Lior@GiliLior · Mar 13

"Summarize this text" out ❌ "Provide a 50-word summary, explaining it to a 5-year-old" in ✅ The way we use LLMs has changed—user instructions are now longer, more nuanced, and packed with constraints. Interested in how LLMs keep up? 🤔 Check out WildIFEval, our new benchmark!

3.0K

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Feb 3

🌍 AI is changing the world. Is AI regulation on the right track? 🤔 While regulators rely on benchmarking 📊, we show why it cannot guarantee AI behavior: arxiv.org/pdf/2501.15693 Excited about this multidisciplinary collaboration! @GabiStanovsky, @RKeydar , @GadiPerl

GGabriel Stanovsky@GabiStanovsky · Feb 3

There's a lot of talk about regulating AI, but do regulators know the technology well enough? In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it? arxiv.org/pdf/2501.15693

359

Eliya Habba @ ACL 2025 🇦🇹@EliyaHabba · Dec 12

If you're at #NeurIPS2024 don't miss @nitzanguetta's poster. There are some really FUN #VisualRiddles by @EliyaHabba. Not there? Checkout the project's Github!

YYonatan Bitton@YonatanBitton · Dec 10

🚨 Happening NOW at #NeurIPS2024 with @nitzanguetta ! 🎭 #VisualRiddles: A Commonsense and World Knowledge Challenge for Vision-Language Models. 📍 East Ballroom C, Creative AI Track 🔍 visual-riddles.github.io

845

Eliya Habba @ ACL 2025 🇦🇹 Retweeted

Noam Dahan@Dahan_Noam · Nov 21

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news. Presenting our survey following 130+ datasets in 100+ languages! Explore: github.com/edahanoam/Awes… @GabiStanovsky, @nlphuji 1/6

3.0K