Nouha Dziri

@nouhadziri

Research Scientist @allen_ai, PhD in NLP 🤖 UofA. Ex @GoogleDeepMind @MSFTResearch @MilaQuebec 🚨🚨 NEW BLOG about LLMs reasoning: https://shorturl.at/FEWKm

Seattle, US

Joined February 2011

698Following

5KFollowers

Pinned

Nouha Dziri@nouhadziri · May 31, 2023

🚀📢 GPT models have blown our minds with their astonishing capabilities. But, do they truly acquire the ability to perform reasoning tasks that humans find easy to execute? NO⛔️ We investigate the limits of Transformers *empirically* and *theoretically* on compositional tasks🔥

nouhadziri's tweet image. 🚀📢 GPT models have blown our minds with their astonishing capabilities. But, do they truly acquire the ability to perform reasoning tasks that humans find easy to execute? NO⛔️

We investigate the limits of Transformers *empirically* and *theoretically* on compositional tasks🔥

332

1.0K

882

497.0K

Nouha Dziri@nouhadziri · 6 h

Our #ACL2025NLP workshop REALM on LLM agents is happening July 31 in Vienna 🎶🎼 🗓️ Schedule & accepted papers are live! realm-workshop.github.io 🚀Join us for a day of invited talks, paper presentations and a panel discussion with an amazing line-up!

NNouha Dziri@nouhadziri · Jan 22

Interested in knowing more about LLMs agents and in contributing to this topic?🚀 📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻 We have an exciting lineup of speakers 🗓️ Submit your work by *March 1st*

2.0K

Nouha Dziri Retweeted

Seungju Han@SeungjuHan3 · Jul 21

life update: I'll be starting my PhD in CS at Stanford this September! I'm very excited to continue my research on reasoning of language models and to make new friends in the Bay Area! I'm deeply grateful to everyone who supported me and made this milestone possible…

734

117

66.0K

Nouha Dziri Retweeted

François Chollet@fchollet · Jul 19

Intelligence isn't a collection of skills. It's the efficiency with which you acquire and deploy new skills. It's an efficiency ratio. And that's why benchmark scores can be very misleading about the actual intelligence of AI systems.

123

364

3.0K

602

182.0K

Nouha Dziri@nouhadziri · Jul 19

Come listen to the second talk in 15min about the role of data in building trustworthy LLMs and memorization/creativity in LLMs. #ICML2025 Where? Ballroom West A

NNouha Dziri@nouhadziri · May 15

Super excited😍 to have been invited to speak at the Data in Generative Models workshop in ICML 2025 along this stellar lineup of speakers! I’ll be talking about AI safety, robustness, reasoning, and trustworthy LLMs. See you soon in Vancouver🇨🇦 Submit your work (ASAP) by May…

2.0K

Nouha Dziri@nouhadziri · Jul 19

Speaking in 30min about agents and safety in Computer use agents at West Meeting room 211-214 #ICML2025

NNouha Dziri@nouhadziri · Jul 11

SUPER excited about next week #ICML2025 in Vancouver 🇨🇦 I'm invited to 3 talks/panels among a brilliant group of researchers. Come listen to different things about RL limits, generalization, post-training data mixing, agents evaluation and more 🔥👇 🤖Workshop on Computer Use…

2.0K

Nouha Dziri@nouhadziri · Jul 18

We blend imitation (SFT) and exploration (RLVR) in post-training with a simple idea: Sample a prefix of an SFT demonstration, let your policy model complete it, and mix it with other RLVR rollouts Intuitively, the model relies more on hints for problems currently out of reach

ZZeyu Huang@ZeroyuHuang · Jul 18

🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!

3.0K

Nouha Dziri@nouhadziri · Jul 15

Hellooo Vancouver 🇨🇦again!!! I’m HERE with the sun all over me😎☀️☀️and the breathtaking views🥹Will be here until Saturday. You will find me in those talks/panels👇👇

NNouha Dziri@nouhadziri · Jul 11

7.0K

Nouha Dziri@nouhadziri · Jul 15

Current agents are highly unsafe, o3-mini one of the most advanced models in reasoning score 71% in executing harmful requests 😱 We introduce a new framework for evaluating agent safety✨🦺 Discover more 👇 👩‍💻 Code & data: github.com/Open-Agent-Saf… 📄 Paper:…

SSanidhya Vijayvargiya@sanidhya903 · Jul 15

1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵

10.0K

Nouha Dziri@nouhadziri · Jun 25

My group & collaborators have developed many popular benchmarks over the years, e.g., MMLU, MATH, APPS---really excited about our latest benchmark OMEGA Ω: 🔍Can LLMs really think outside the box in math? a new benchmark probing 3 axes of generalization: 1️⃣ Exploratory 2️⃣…

NNouha Dziri@nouhadziri · Jun 24

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found…

156

23.0K

Nouha Dziri@nouhadziri · Jun 25

This just made my entire week🥹thank you for such incredibly kind words @jeremyphoward 🙏Now I'm nervous you'll actually read it thoroughly😅 Your work inspires me constantly! ✨

JJeremy Howard@jeremyphoward · Jun 25

The legendary researcher behind the classic "Faith and Fate" paper has dropped a new paper on compositionality! 😁 I can't wait to dig into this. @nouhadziri is one of my absolute fave thinkers in this space so I have a feeling this will be a classic too…

7.0K

Nouha Dziri@nouhadziri · Jun 24

Great insight from a new paper. Certainly we are learning more how LLMs "think" and "reason"

NNouha Dziri@nouhadziri · Jun 24

🤯 We noticed that many failures stem not from lack of knowledge but from overthinking. Models often find the right answer early in CoT, but spiral into self-corrections and abandon correct solutions. This challenges the assumption: More CoT ≠ better results Sometimes the…

996