Jean Mercat

@MercatJean

Joined October 2018

197Following

91Followers

Jean Mercat Retweeted

Sedrick Keh@sedrickkeh2 · Jul 18

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

112

10.0K

Jean Mercat Retweeted

Russ Tedrake@RussTedrake · Jul 9

The short version is: LBMs work! We see consistent and statistically significant improvements as we increase the amount of pretraining data. But doing the science is still hard; as a field we have more work to do to improve the statistical power of our experiments.

3.0K

Jean Mercat@MercatJean · Jul 9

🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can…

RRuss Tedrake@RussTedrake · Jul 9

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…

185

18.0K

Jean Mercat Retweeted

Ryan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

191

923

727

189.0K

Jean Mercat@MercatJean · May 20

Excited to share what I've been up to: Gemini Diffusion is FAST! I'm convinced this will revolutionise iterative workflows: refine, get instant feedback, repeat! So proud of what our small team achieved here🪐

GGoogle DeepMind@GoogleDeepMind · May 20

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

124

12.0K

Jean Mercat Retweeted

Etash Guha@etash_guha · Apr 3

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

137

464

333

86.0K

Jean Mercat Retweeted

Sedrick Keh@sedrickkeh2 · Mar 13

1/ DeepSeek-VL is trained from DeepSeek LLM Qwen-VL is trained from Qwen-7B PaliGemma is trained from Gemma-2B Is this really the best way to train a VLM? What if we had access to model checkpoints -- would it be better to train with images before the LLM fully converges? 🧵

3.0K

Jean Mercat Retweeted

Alex Dimakis@AlexGDimakis · Feb 25

Pretty happy that our OpenThinker-32B is in no4 position in the General Reasoning Leaderboard. It should also be pointed out which models are open data (post-training data): OpenThinker, LIMO, OpenHermes and DeepScaler.

123

10.0K

Jean Mercat Retweeted

Negin Raoof@NeginRaoof_ · Feb 12

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including…

127

770

523

215.0K

Jean Mercat Retweeted

Negin Raoof@NeginRaoof_ · Jan 30

Want to evaluate your models on reasoning benchmarks? We have integrated many math and coding benchmarks into Evalchemy: AIME24, AMC23, MATH500, LiveCodeBench, GPQA, HumanEvalPlus, MBPPPlus, BigCodeBench, MultiPL-E, and CRUXEval. Further, Evalchemy now supports vLLM and OpenAI,…

11.0K

Jean Mercat Retweeted

Ryan Marten@ryanmart3n · Jan 28

Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.

383

196

37.0K

Jean Mercat Retweeted

Alex Dimakis@AlexGDimakis · Nov 18

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training…

239

128

147.0K

Jean Mercat Retweeted

Achal Dave@achalddave · Jul 22, 2024

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

112

30.0K

Jean Mercat@MercatJean · Jul 23, 2024

Incredible work saving thousands of GPU hours. And all of that in a short and very readable code.

AAchal Dave@achalddave · Jul 23, 2024

Training DataComp-LM models meant we needed fast training code: here's a quick summary of how we sped up training in OpenLM by 60%, reducing costs by ~40%!

109