hazyresearch (@HazyResearch)

Pinned

hazyresearch Retweeted

T

Together AI@togethercompute · Jul 2

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…

9

79

502

329

261.0K

hazyresearch Retweeted

I

Infini-AI-Lab@InfiniAILab · Jul 23

Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run…

1

9

43

10

19.0K

hazyresearch Retweeted

J

James Zou@james_y_zou · Jul 24

Incredibly honored and grateful to receive the Overton Prize at #ISMBECCB2025♥️ Many thanks to ISCB and my amazing students, collaborators and mentors!🙏

27

9

245

4

14.0K

hazyresearch Retweeted

D

Dev Valladares@dev_valladares · Jul 18

Infinite Wiki ⁕ Every word is a hyperlink. Every description is generated in real-time (in ~1 second) ⁕ Runs on Gemini 2.5 Flash Lite. ASCII diagrams using 2.5 Flash

102

233

3.0K

2.0K

242.0K

h

hazyresearch@HazyResearch · Jul 17

join us in leading the fight against Big Token 🏴‍☠️⚔️ go.cartesia.ai/join

DDavid Pfau@pfau · Jul 16

Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.

1

4

20

1

3.0K

hazyresearch Retweeted

A

Arnav Goel@_goel_arnav · Jul 15

I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol

3

30

318

28

48.0K

h

hazyresearch@HazyResearch · Jul 15

Excited to share our latest at ICML 2025: pushing LoRA fine-tuning to below 2 bits (as low as 1.15 bits), unlocking up to 50% memory savings. Another step toward cheaper, democratized LLMs on commodity hardware! w/ the amazing team: @zhou_cyrus68804 @KumbongHermann @KunleOlukotun

CCyrus (Zikai) Zhou@zhou_cyrus68804 · Jul 13

🚀 New #ICML2025 drop! LowRA slashes LoRA to 1.15 bits / param and outperforms every sub-4-bit baseline. w/ @qizhengz_alex @KumbongHermann @KunleOlukotun 👇 (1 / N)

0

4

12

1

1.0K

h

hazyresearch@HazyResearch · Jul 14

Big Token is quaking in their boots dont worry, we’re here to free you all

ttheseriousadult@gallabytes · Jul 14

...wtf anthropic?

2

5

111

10

13.0K

h

hazyresearch@HazyResearch · Jul 11

hyped to announce this collab. minions ❤️ @AMD . edge compute ftw 🚀

AAI at AMD@AIatAMD · Jul 11

We’re thrilled to collaborate with the @HazyResearch @StanfordAILab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with…

0

1

8

0

1.0K

hazyresearch Retweeted

A

AI at AMD@AIatAMD · Jul 11

We’re thrilled to collaborate with the @HazyResearch @StanfordAILab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with…

2

16

80

4

5.0K

hazyresearch Retweeted

S

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

91

708

5.0K

4.0K

700.0K

h

hazyresearch@HazyResearch · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

59

184

1.0K

755

186.0K

hazyresearch Retweeted

C

Cartesia@cartesia_ai · Jul 11

We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are…

5

44

325

136

22.0K

h

hazyresearch@HazyResearch · Jul 11

At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our…

CCartesia@cartesia_ai · Jul 11

We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are…

0

10

64

18

4.0K

h

hazyresearch@HazyResearch · Jul 8

Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: github.com/NVlabs/HMAR checkpoints: huggingface.co/nvidia/HMAR

HHermann@KumbongHermann · Jun 9

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…

0

11

38

2

5.0K

h

hazyresearch@HazyResearch · Jul 2

Together AI’s first GB200 cluster built by Dell!

MMichael Dell 🇺🇸@MichaelDell · Jul 2

Good morning

3

10

108

8

12.0K

hazyresearch Retweeted

A

Azalia Mirhoseini@Azaliamirh · Jun 26

Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny…

3

50

225

127

18.0K

h

hazyresearch@HazyResearch · Jun 24

LLMs can generate 100 answers, but which one is right? Check out our latest work closing the generation-verification gap by aggregating weak verifiers and distilling them into a compact 400M model. If this direction is exciting to you, we’d love to connect.

JJon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

1

16

46

8

5.0K

hazyresearch Retweeted

T

Together AI@togethercompute · Jun 20

New Notebook: LLM Evals with Batch Inference! The new batch API is perfect for running large benchmarks - 50% cost savings with 24h turnaround. We evaluate DeepSeek-V3-0324 on SimpleQA as an example. Link below! 🧵

1

8

2

2.0K

h

hazyresearch@HazyResearch · Jun 17

Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️

DDan Fu@realDanFu · Jun 5

Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!

2

3

13

3

4.0K