hazyresearch
@HazyResearch
A research group in @StanfordAILab working on the foundations of machine learning & systems. http://hazyresearch.stanford.edu/ Ostensibly supervised by Chris Ré
Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…
Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run…
Incredibly honored and grateful to receive the Overton Prize at #ISMBECCB2025♥️ Many thanks to ISCB and my amazing students, collaborators and mentors!🙏
Infinite Wiki ⁕ Every word is a hyperlink. Every description is generated in real-time (in ~1 second) ⁕ Runs on Gemini 2.5 Flash Lite. ASCII diagrams using 2.5 Flash
join us in leading the fight against Big Token 🏴☠️⚔️ go.cartesia.ai/join
Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.
I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol
Excited to share our latest at ICML 2025: pushing LoRA fine-tuning to below 2 bits (as low as 1.15 bits), unlocking up to 50% memory savings. Another step toward cheaper, democratized LLMs on commodity hardware! w/ the amazing team: @zhou_cyrus68804 @KumbongHermann @KunleOlukotun
🚀 New #ICML2025 drop! LowRA slashes LoRA to 1.15 bits / param and outperforms every sub-4-bit baseline. w/ @qizhengz_alex @KumbongHermann @KunleOlukotun 👇 (1 / N)
Big Token is quaking in their boots dont worry, we’re here to free you all
...wtf anthropic?
hyped to announce this collab. minions ❤️ @AMD . edge compute ftw 🚀
We’re thrilled to collaborate with the @HazyResearch @StanfordAILab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with…
We’re thrilled to collaborate with the @HazyResearch @StanfordAILab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with…
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are…
At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our…
We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are…
Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: github.com/NVlabs/HMAR checkpoints: huggingface.co/nvidia/HMAR
Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…
Together AI’s first GB200 cluster built by Dell!
Good morning
Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny…
LLMs can generate 100 answers, but which one is right? Check out our latest work closing the generation-verification gap by aggregating weak verifiers and distilling them into a compact 400M model. If this direction is exciting to you, we’d love to connect.
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…
New Notebook: LLM Evals with Batch Inference! The new batch API is perfect for running large benchmarks - 50% cost savings with 24h turnaround. We evaluate DeepSeek-V3-0324 on SimpleQA as an example. Link below! 🧵
Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️
Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!