Arkil Patel (@arkil_patel)

Pinned

A

Arkil Patel@arkil_patel · Apr 1

𝐓𝐡𝐨𝐮𝐠𝐡𝐭𝐨𝐥𝐨𝐠𝐲 paper is out! 🔥🐋 We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! 🌐: mcgill-nlp.github.io/thoughtology/

SSara Vera Marjanović@saraveramarjano · Apr 1

Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. 🔗: mcgill-nlp.github.io/thoughtology/

0

5

25

6

3.0K

Arkil Patel Retweeted

M

Marius Mosbach@mariusmosbach · Jul 22

Nice work! We observed a similar trend on certain math tasks in our work: arxiv.org/abs/2504.07128 Section 4.1 has a discussion of our findings. You might want to consider citing it :) cc @saraveramarjano @arkil_patel @sivareddyg

0

4

12

3

1.0K

A

Arkil Patel@arkil_patel · Jul 17

If you’re at ICML and if you work on interpretability or causality, go talk to @_shruti_joshi_, she has a fantastic paper!

SShruti Joshi@_shruti_joshi_ · Jul 17

I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!

0

3

0

249

A

Arkil Patel@arkil_patel · Jul 14

Come find us at the #ICML2025 poster if you are interested in safety of web agents!

NNicholas Meade@ncmeade · Jul 14

I'll be at #ICML2025 this week presenting SafeArena (Wednesday 11AM - 1:30PM in East Exhibition Hall E-701). Come by to chat with me about web agent safety (or anything else safety-related)!

0

5

26

0

2.0K

A

Arkil Patel@arkil_patel · Jul 14

SafeArena is being presented at #ICML2025 !! Check out our poster and talk to @ncmeade for all things ‘safety ∪ agents ∪ LLMs’!

NNicholas Meade@ncmeade · Jul 14

I'll be at #ICML2025 this week presenting SafeArena (Wednesday 11AM - 1:30PM in East Exhibition Hall E-701). Come by to chat with me about web agent safety (or anything else safety-related)!

0

1

11

0

372

A

Arkil Patel@arkil_patel · Jul 1

Congrats @vernadankers!! We’re lucky to have you join our lab!

TTal Linzen@tallinzen · Jul 1

Congratulations Verna! This was one of the best theses I've ever read, I highly recommend checking out Verna's work on the tradeoffs between memorization and generalization in language models! vernadankers.com

0

5

0

400

A

Arkil Patel@arkil_patel · Jul 1

I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen and @PontiEdoardo for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg's wonderful lab @Mila_Quebec 🤩

AAgostina Calabrese 🦋@agostina_cal · Jul 1

Huge congratulations to Dr. @vernadankers for passing her viva today! 🥳🎓 It's been an honour sharing the PhD journey with you. I wasn’t ready for the void your sudden departure left (in the office and in my life!). Your new colleagues are lucky to have you! 🥺🥰 @Edin_CDT_NLP

11

8

90

1

12.0K

Arkil Patel Retweeted

X

Xing Han Lu@xhluca · Jun 13

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

9

56

195

125

22.0K

Arkil Patel Retweeted

Z

Ziling Cheng@ziling_cheng · Jun 6

Do LLMs hallucinate randomly? Not quite. Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode — revealing how LLMs generalize using abstract classes + context cues, albeit unreliably. 📎 Paper: arxiv.org/abs/2505.22630 1/n

1

24

39

17

3.0K

Arkil Patel Retweeted

K

Kabir@kabirahuja004 · Apr 18

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ @melaniesclar, and @tsvetshop 1/n

3

51

261

127

40.0K

Arkil Patel Retweeted

A

AK@_akhaliq · Apr 15

AgentRewardBench Evaluating Automatic Evaluations of Web Agent Trajectories

3

27

158

75

32.0K

A

Arkil Patel@arkil_patel · Apr 15

A key reason RL for web agents hasn’t fully taken off is the lack of robust reward models. No matter the algorithm (PPO, GRPO), we can’t reliably do RL without a reward signal. With AgentRewardBench, we introduce the first benchmark aiming to kickstart progress in this space.

XXing Han Lu@xhluca · Apr 15

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…

2

22

95

54

7.0K

Arkil Patel Retweeted

�

🇺🇦 Dzmitry Bahdanau@DBahdanau · Apr 15

many many many thanks to @kchonyc and @Yoshua_Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲

13

16

271

14

21.0K

A

Arkil Patel@arkil_patel · Apr 15

Super timely work led by @xhluca with extensive human evaluation of agent trajectories across multiple benchmarks and LLMs!

XXing Han Lu@xhluca · Apr 15

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…

0

2

12

2

828

A

Arkil Patel@arkil_patel · Apr 12

DeepSeek-R1 Thoughtology now #2 on @huggingface daily papers Thanks for building this great platform for sharing new papers @_akhaliq

XXing Han Lu@xhluca · Apr 11

DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning 142-page report diving into the reasoning chains of R1. It spans 9 unique axes: safety, world modeling, faithfulness, long context, etc.

2

20

111

33

23.0K

Arkil Patel Retweeted

X

Xing Han Lu@xhluca · Apr 11

DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning 142-page report diving into the reasoning chains of R1. It spans 9 unique axes: safety, world modeling, faithfulness, long context, etc.

6

137

739

667

73.0K

A

Arkil Patel@arkil_patel · Apr 11

I think one of the most underrated sources of insight in research is just looking at the model's outputs. The Thoughtology paper is what happens when an entire lab of grad students at Mila do this cumbersome task for R1's CoT and actually quantifies all the patterns we saw.

SSara Vera Marjanović@saraveramarjano · Apr 1

Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. 🔗: mcgill-nlp.github.io/thoughtology/

1

8

50

24

4.0K

A

Arkil Patel@arkil_patel · Apr 11

Thoughtology is trending today on hf daily papers! Read our paper for a detailed analysis of R1’s long chains of thoughts across a variety of settings. huggingface.co/papers/2504.07…

SSara Vera Marjanović@saraveramarjano · Apr 11

And thoughtology is now on Arxiv! Read more about R1 reasoning 🐋💭 across visual, cultural and psycholinguistic tasks at the link below: 🔗 arxiv.org/abs/2504.07128

0

1

5

0

467

Arkil Patel Retweeted

A

Amirhossein Kazemnejad @ ICML@a_kazemnejad · Apr 3

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

15

163

1.0K

83.0K

A

Arkil Patel@arkil_patel · Apr 2

Watch Siva’s talk on thoughtology: youtube.com/live/aO_cTIY9K…

AArkil Patel@arkil_patel · Apr 1

𝐓𝐡𝐨𝐮𝐠𝐡𝐭𝐨𝐥𝐨𝐠𝐲 paper is out! 🔥🐋 We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! 🌐: mcgill-nlp.github.io/thoughtology/

0

1

6

0

530

A

Arkil Patel@arkil_patel · Apr 1

I will be giving a talk about this work @SimonsInstitute tomorrow (Apr 2nd 3PM PT). Join us, both in-person or virtually. simons.berkeley.edu/workshops/futu…

SSiva Reddy@sivareddyg · Apr 1

Introducing the DeepSeek-R1 Thoughtology -- the most comprehensive study of R1 reasoning chains/thoughts ✨. Probably everything you need to know about R1 thoughts. If we missed something, please let us know.

1

10

53

12

14.0K