Dan Fu (@realDanFu)

Pinned

D

Dan Fu@realDanFu · Aug 19

Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then…

47

39

572

106

109.0K

D

Dan Fu@realDanFu · Jul 19

I really enjoyed this talk from @bariskasikci at @ESFoMo - some really fine-grained analysis of compute patterns of LLM serving in the throughput-bound regime, and how to schedule operations to push the boundaries (a linear program)! Great work!

EES-FoMo@ICML2025@ESFoMo · Jul 19

Next we have @bariskasikci with a talk on the quest for blazingly fast LLM inference!

0

3

15

4

2.0K

D

Dan Fu@realDanFu · Jul 18

ES-FoMo is back tomorrow! Come join is in East Exhibition Hall A bright and early at 8:30AM for a great slate of invited talks, orals, spotlight lightning talks, and 150 posters!

EES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

0

2

14

0

1.0K

Dan Fu Retweeted

E

ES-FoMo@ICML2025@ESFoMo · Jul 18

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

3

21

77

25

43.0K

D

Dan Fu@realDanFu · Jul 17

Fastest Deepseek! Super proud of the amazing inference team at Together for pulling this off!

TTogether AI@togethercompute · Jul 17

Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃‍Fastest time to first answer token:…

0

1

7

0

659

Dan Fu Retweeted

T

Together AI@togethercompute · Jul 17

Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on @NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃‍Fastest time to first answer token:…

7

14

102

23

37.0K

D

Dan Fu@realDanFu · Jul 14

Synthetics like associative recall, MQAR are a great guide to building models. Excited to see this work from @nick11roberts to create new LMs!

NNicholas Roberts@nick11roberts · Jul 14

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]

0

1

12

1

3.0K

Dan Fu Retweeted

N

Nicholas Roberts@nick11roberts · Jul 14

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to @COLM_conf 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]

1

18

44

3

6.0K

D

Dan Fu@realDanFu · Jul 11

This is really cool! There’s a ton of places where a dynamic differentiable hierarchy makes a ton of sense. Awesome to see progress here!

AAlbert Gu@_albertgu · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

0

1

19

1

1.0K

D

Dan Fu@realDanFu · Jul 11

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

59

184

1.0K

755

186.0K

D

Dan Fu@realDanFu · Jul 8

HMAR code and models are out!

HHermann@KumbongHermann · Jul 8

Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: github.com/NVlabs/HMAR checkpoints: huggingface.co/nvidia/HMAR

0

8

0

879

D

Dan Fu@realDanFu · Jul 8

Happy to share that our HMAR code and pre-trained models are now publicly available. Please try them out here: code: github.com/NVlabs/HMAR checkpoints: huggingface.co/nvidia/HMAR

HHermann@KumbongHermann · Jun 9

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…

0

11

38

2

5.0K

D

Dan Fu@realDanFu · Jun 26

Day zero support for Flux kontext dev on Chipmunk! Great work @austinsilveria!

AAustin Silveria@austinsilveria · Jun 26

🐿️ chipmunk ship! flux kontext supported for up to 30% faster cute chipmunks!

0

8

0

1.0K

Dan Fu Retweeted

A

Austin Silveria@austinsilveria · Jun 26

🐿️ chipmunk ship! flux kontext supported for up to 30% faster cute chipmunks!

1

2

6

0

2.0K

D

Dan Fu@realDanFu · Jun 24

What a throwback to weak supervision! Great work @JonSaadFalcon @ekellbuch @MayeeChen!

JJon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

1

7

24

5

5.0K

D

Dan Fu@realDanFu · Jun 17

Chipmunks for everyone!

ssoham@SohamGovande · Jun 17

Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️

1

11

2

2.0K

D

Dan Fu@realDanFu · Jun 17

Chipmunks can now hop across multiple GPU architectures (sm_80, sm_89, sm_90). You can get a 1.4-3x lossless speedup when generating videos on A100s, 4090s, and H100s! Chipmunks also play with more open-source models: Mochi, Wan, & others (w/ tutorials for integration) 🐿️

DDan Fu@realDanFu · Jun 5

Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!

2

3

13

3

4.0K