Sedrick Keh
@sedrickkeh2
research engineer @ToyotaResearch interested in pre-training, post-training, and multimodality
📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

Just did a version of this! And it’s such an energizing experience. Travel award for 8 builders and engineers from the Philippines to spend a week in a retreat here in Silicon Valley. All are working on tough problems to deploy AI in a frontier market with poor infra but…
If you make >$300k/yr why aren’t you announcing random $1,000 prizes every Saturday for whatever you want to see happen in the world? $1k prize for best blog post on X, $1k or best art like Y, $1k for best _____. High agency mindset
a lot of data research to be done across all sorts of tasks and modalities!
Some of the most impactful work you can do in academia isn’t cool new algos or novel architectures. It’s data research. Data research isn’t just dumping tokens into a json. It requires a ton of rigorous experimentation, algorithmic thinking, and actually talking to your models.
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
Open Thoughts delivers again. Congrats team for a small but powerful reasoning model. Writeup: open-thoughts.ai/blog/ot3_small
📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
If you’re working on robotics and AI, the recent Stanford talk from @RussTedrake on scaling multitask robot manipulation is a mandatory watch, full stop. No marketing, no hype. Just solid hypothesis driven science, evidence backed claims. A gold mine in today’s landscape!
This plot is a thing of beauty. Great visualization by @MercatJean! One of many cool artifacts that arose from conducting 1000+ experiments for OpenThoughts 😀
We evaluated more than 1000 reasoning LLMs on 12 reasoning-focused benchmarks and made fascinating observations about cross-benchmark comparisons. You can explore all that data yourself on our HuggingFace spaces page. (1/4)
Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689
How can we achieve both common sense understanding that can deal with varying levels of ambiguity in language and dextrous manipulation? Check out CodeDiffuser, a really neat work that bridges Code Gen with a 3D Diffusion Policy! This was a fun project with cool experiments! 🤖
🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)
OpenThoughts3 is the #1 trending dataset on Huggingface! Thank you to everyone who is using the dataset and giving us great feedback 🚀!
#CVPR2025 starts in two days, and can’t wait to share our new work! 🎉 We present ZeroGrasp, a unified framework for 3D reconstruction and grasp prediction that generalizes to unseen objects. Paper📄: arxiv.org/abs/2504.10857 Webpage🌐:sh8.io/#/zerograsp (1/4 🧵)
Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Very proud of this work and the team! Nvidia released nemotron recently which is a great open reasoning model. The OpenThinker team worked tirelessly and heroically and curated what's arguably the best reasoning data, and got the model to be better than nemotron (and gpt4.1).…
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…