Negin Raoof (@NeginRaoof_)

Pinned

N

Negin Raoof@NeginRaoof_ · Jun 5

OpenThinker3-7B now outperforms Nvidia’s Nemotron-Nano-8B and DeepSeek-R1-Distill-Qwen-7B on reasoning benchmarks. It’s the strongest open-data reasoning model at the 7B scale 🧠 Today we’re releasing the full data curation recipe, dataset and model along with our paper. Huge…

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

7

11

83

20

4.0K

Pinned

Negin Raoof Retweeted

A

Andy Konwinski@andykonwinski · Jun 23

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.

57

120

1.0K

459

315.0K

Negin Raoof Retweeted

S

Sedrick Keh@sedrickkeh2 · Jul 18

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

1

30

112

29

10.0K

Negin Raoof Retweeted

A

Alex Shaw@alexgshaw · Jul 16

Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now…

1

20

83

39

9.0K

Negin Raoof Retweeted

J

Jean Mercat@MercatJean · Jun 24

We evaluated more than 1000 reasoning LLMs on 12 reasoning-focused benchmarks and made fascinating observations about cross-benchmark comparisons. You can explore all that data yourself on our HuggingFace spaces page. (1/4)

2

18

94

71

15.0K

Negin Raoof Retweeted

E

Etash Guha@etash_guha · Jun 10

OpenThoughts3 is the #1 trending dataset on Huggingface! Thank you to everyone who is using the dataset and giving us great feedback 🚀!

1

7

45

8

4.0K

N

Negin Raoof@NeginRaoof_ · Jun 6

Open weights, Open data, Open code -- SOTA reasoning model with only 7B parameters. Excited to see LlamaFactory powering its training 🥳

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

3

2

29

6

2.0K

Negin Raoof Retweeted

L

Ludwig Schmidt@lschmidt3 · Jun 5

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

21

212

1.0K

875

167.0K

N

Negin Raoof@NeginRaoof_ · Jun 5

Amazing work by the drivers of this project 🥳 @etash_guha @ryanmart3n @sedrickkeh2 @NeginRaoof_

RRyan Marten@ryanmart3n · Jun 5

Paper: arxiv.org/abs/2506.04178 Model: huggingface.co/open-thoughts/… Dataset: huggingface.co/datasets/open-… Code: github.com/open-thoughts/… Blog: openthoughts.ai/blog/ot3 (10/N)

3

2

15

1

655

N

Negin Raoof@NeginRaoof_ · Jun 5

OpenThoughts3-1.2M and OpenThinker3-7B are a major milestone for open-data reasoning models, improving over DeepSeek-R1-Distill-Qwen-7B, as well as Nemotron-Nano-8B! Excited to be part of this work, and a thank you to everyone in the team!

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

0

2

13

0

492

N

Negin Raoof@NeginRaoof_ · Jun 5

I love how counterintuitive rigorous empirical research can be. We found that the best models (R1) aren't necessarily the best teachers (QwQ), and that scaling questions per answer is as efficient as scaling the number of questions. Great work team!

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

0

1

3

0

101

N

Negin Raoof@NeginRaoof_ · Jun 5

Congrats. one of the pioneering efforts on open reasoning models right now. Had no idea this was such a big team! For smaller models, distilling from R1 is the easiest path to performance. I'm more interested by the RL side (the work is more fun), but this is very impactful.

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

5

29

151

53

16.0K

N

Negin Raoof@NeginRaoof_ · Jun 5

I'm excited to announce what we have been working on for months. Announcing OpenThinker3, the strongest 7B reasoning model with open data. Also more than 1000 experiments on what works and what doesn't for post-training data curation.

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

5

28

253

102

17.0K

N

Negin Raoof@NeginRaoof_ · Jun 5

we created a SOTA reasoning dataset. a massive effort and an extremely fun time working with an awesome team :)

RRyan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

1

4

21

0

1.0K

Negin Raoof Retweeted

M

Mike A. Merrill@Mike_A_Merrill · May 19

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr…

15

62

233

101

47.0K

N

Negin Raoof@NeginRaoof_ · May 12

Happy #WomeninMathematics day!

AAhmad Beirami@abeirami · May 11, 2024

Happy #WomeninMathematics day! May 12 marks the birthday of Maryam Mirzakhani, a mathematician who was awarded the Fields medal (highest honor in math) for her contributions to geometry and dynamical systems. Two of my fav mathematicians: Maryam Mirzakhani & Ingrid Daubechies

0

4

51

0

4.0K

Negin Raoof Retweeted

E

Etash Guha@etash_guha · May 5

Pretty pleased to see OpenThinker as a baseline in a frontier lab report like Phi-4 reasoning! Open data reasoning models are kicking strong. Thanks @MSFTResearch!

1

10

2

895

N

Negin Raoof@NeginRaoof_ · May 3

Had so much fun, thanks to @berkeley_csge!

JJeremy Fiance@jfiance · May 3

1/ Berkeley AI is just next level OpenAI co-founder @johnschulman2 Perplexity co-founder @denisyarats Databricks co-founder @andykonwinski Bespoke Labs co-founder @AlexGDimakis All on one panel w/ audience of brilliant Berkeley AI grad student researchers My kinda Fri…

0

9

0

487

N

Negin Raoof@NeginRaoof_ · May 3

The Berkeley entrepreneurs student club has some cool alums that started a few small startups.

JJeremy Fiance@jfiance · May 3

1/ Berkeley AI is just next level OpenAI co-founder @johnschulman2 Perplexity co-founder @denisyarats Databricks co-founder @andykonwinski Bespoke Labs co-founder @AlexGDimakis All on one panel w/ audience of brilliant Berkeley AI grad student researchers My kinda Fri…

1

4

21

1

4.0K