SynthLabs
@synth_labs
Scaling Up Good Synthetic Reasoning We're hiring! ➡️ https://jobs.synthlabs.ai / DM 💬 @NathanThinks
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇


Read how @synth_labs, a startup developing AI solutions tailored for logical reasoning, is advancing AI post-training with our @TractoAI: nebius.com/customer-stori… 🔹 Goal: Develop an ML system that empowers reasoning models to surpass pattern matching and implement sophisticated…
the future is about smart tokens
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓
Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10
Generative Reward Models impact compounds daily. way stronger interest now than when we published last fall 👇 many excellent recent extensions—cool seeing where researchers take GenRM
we bootstrapped our way to generalized meta-reasoning capabilities with generative reward models classical reward models can be worse than random on new reasoning tasks 🎲 we see improvements in robustness, generalization, interpretability and an opportunity to unify RLHF/RLAIF…
btw we have ongoing research on this front! we're open-science, pro-publication, and love collaboration. want to push this frontier forward? we're growing our SF team & always open to research partners—reach out, my DMs are open 📩
excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!
btw, random fun fact we pointed out months ago: the only MATH example @OpenAI published with o1 announcement included an unsubstantiated assumption 😬
> still hacks at a fairly high rate > we wouldn't notice this agent was misaligned Meanwhile, industry: aggressively distilling Meta-CoT slop directly into models 🫡
The final stop in our meetup series will be in San Francisco! 🌁 nebius.com/events/nebius-… Join us at Convene 100 Stockton near Union Square on Thursday, March 13, for a deep dive into our AI cloud. Our developers, AI R&D engineers and architects will share insights with the tech…
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Author's Explanation: x.com/synth_labs/sta… Overview: Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, is purposefully designed for…
Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM reasoning! 📝 250,000+ problems, 47k NEW Q's ✅ 10x larger than existing datasets like MATH 🧑⚖️ Verifiable—we eliminated 400k+ problems Details below! 🧵👇
Big-Math: Big-Math: Massive Math Dataset for RL Training - 10x larger than GSM8k/MATH - 3 core properties: uniquely verifiable, open-ended, closed-form - Human-validated 90%+ precision filters - Difficulty metrics for curriculum learning