Parishad BehnamGhader

@ParishadBehnam

NLP PhD student at @Mila_Quebec and @mcgillu

Joined February 2017

99Following

149Followers

Parishad BehnamGhader@ParishadBehnam · Jun 20

The video is online now! 3min speed science talk on "From a soup of raw pixels to abstract meaning" youtu.be/AHsoMYG2Vqk?si…

BBenno Krojer@benno_krojer · Jun 6

Turns out condensing your research into 3min is very hard but also teaches you a lot

3.0K

Parishad BehnamGhader Retweeted

Xing Han Lu@xhluca · Jun 13

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

195

125

22.0K

Parishad BehnamGhader Retweeted

Afra Amini@afra_amini · May 6

Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇 w/ @xtimv and Ryan Cotterell code: arxiv.org/pdf/2504.10637 paper: github.com/rycolab/kl-rb

114

10.0K

Parishad BehnamGhader@ParishadBehnam · Apr 25

Excited to be part of this panel today at the WiML social, 12:30 PM - 2:00 PM, Hall 1 Apex

WWiML@WiMLworkshop · Apr 25

We are honored to be joined by 4 amazing women in ML on the "Papers, Patents, or Products?" panel: - @ReyhaneAskari (FAIR) - @nouhadziri (AI2) - @vernadec (U Tübingen) - Katherine Driscoll (Graph Therapeutics)

2.0K

Parishad BehnamGhader Retweeted

Xing Han Lu@xhluca · Apr 15

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…

227

148

33.0K

Parishad BehnamGhader Retweeted

Amirhossein Kazemnejad @ ICML@a_kazemnejad · Apr 3

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

163

1.0K

83.0K

Parishad BehnamGhader@ParishadBehnam · Apr 2

Talking about "DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning" Going live at 11am PDT (i.e., 20 mins). Last minute change of plans. You might be able to see live here: youtube.com/watch?v=aO_cTI…

SSiva Reddy@sivareddyg · Apr 1

I will be giving a talk about this work @SimonsInstitute tomorrow (Apr 2nd 3PM PT). Join us, both in-person or virtually. simons.berkeley.edu/workshops/futu…

11.0K

Parishad BehnamGhader Retweeted

Sara Vera Marjanović@saraveramarjano · Apr 1

Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. 🔗: mcgill-nlp.github.io/thoughtology/

228

146

42.0K

Parishad BehnamGhader@ParishadBehnam · Mar 12

me when I see Promptriever has the highest score in some columns

PParishad BehnamGhader@ParishadBehnam · Mar 12

Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣 Retrievers need to be aligned too! 🚨🚨🚨 Work done with the wonderful @ncmeade and @sivareddyg 🔗 mcgill-nlp.github.io/malicious-ir/ Thread: 🧵👇

1.0K

Parishad BehnamGhader Retweeted

Xing Han Lu@xhluca · Mar 10

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation? To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web…

35.0K

Parishad BehnamGhader Retweeted

Karolina Stanczak@karstanczak · Mar 4

📢New Paper Alert!🚀 Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?🤔 Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment🧵

14.0K

Parishad BehnamGhader Retweeted

Reyhane Askari@ReyhaneAskari · Feb 28

🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥…

283

169

45.0K

Parishad BehnamGhader Retweeted

Arkil Patel@arkil_patel · Feb 21

Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨ Work w/ fantastic advisors @DBahdanau and @sivareddyg Thread 🧵:

7.0K