Albert Tseng (@tsengalb99)

Pinned

A

Albert Tseng@tsengalb99 · Mar 3

Excited to announce our #AISTATS📜on training LLMs with MXFP4! We use stoch. rounding and random Hadamard transforms (all fast on HW) to get low-variance, unbiased gradient estimates with MXFP4 GEMMs. We get a ~30% speedup over FP8 with almost no PPL gap! arxiv.org/abs/2502.20586

tsengalb99's tweet image. Excited to announce our #AISTATS📜on training LLMs with MXFP4! We use stoch. rounding and random Hadamard transforms (all fast on HW) to get low-variance, unbiased gradient estimates with MXFP4 GEMMs. We get a ~30% speedup over FP8 with almost no PPL gap!

arxiv.org/abs/2502.20586

1

8

23

7

3.0K

Albert Tseng Retweeted

Y

Yingheng Wang@yingheng_wang · Jun 23

❓ Are LLMs actually problem solvers or just good at regurgitating facts? 🚨New Benchmark Alert! We built HeuriGym to benchmark if LLMs can craft real heuristics for real-world hard combinatorial optimization problems. 🛞 We’re open-sourcing it all: ✅ 9 problems ✅ Iterative…

2

25

131

36

16.0K

Albert Tseng Retweeted

K

Kevin Ellis@ellisk_kellis · Jun 12

New paper: World models + Program synthesis by @topwasu 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments…

17

106

569

482

53.0K

A

Albert Tseng@tsengalb99 · Jun 10

I will be at #CVPR2025 presenting our work on differential operators for hybrid neural fields! Catch me at our poster: 🗓️ Fri, June 13, 10:30 AM–12:30 PM 📍 ExHall D, Poster #34 🔗 cvpr.thecvf.com/virtual/2025/p… Details below ⬇️

AAditya Chetan@justachetan · Dec 23, 2023

📢 Excited to share our latest work on computing accurate differential operators for hybrid neural fields (like Instant NGP)! 🔗: justachetan.github.io/hnf-derivative… 🧵👇🏻 (1/n)

0

4

23

2

2.0K

A

Albert Tseng@tsengalb99 · Jun 9

Checkout CARTRIDGES, scaling cache-time compute! An alternative to ICL for settings where many different user messages reference the same large corpus of text!

SSabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

0

5

40

9

6.0K

Albert Tseng Retweeted

S

Sabri Eyuboglu@EyubogluSabri · Jun 9

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…

12

72

298

216

64.0K

A

Albert Tseng@tsengalb99 · Jun 6

Albert and co continue to do excellent work on quantization. This time the trick is to minimize KL wrt the original model, with a clever hessian factorization

AAlbert Tseng@tsengalb99 · Jun 5

📣Introducing our latest work: Yet Another Quantization Algorithm! YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯 arxiv.org/abs/2505.22988👇

0

19

129

45

10.0K

A

Albert Tseng@tsengalb99 · Jun 6

Apparently I chose the worst day to release a paper, so ICYMI, we made a post-training quantization algorithm that outperforms even @Google's quantization-aware training recipe. We beat the prior SOTA by >30%, meaning faster and smaller models. More details in the original 🧵👇

AAlbert Tseng@tsengalb99 · Jun 5

📣Introducing our latest work: Yet Another Quantization Algorithm! YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯 arxiv.org/abs/2505.22988👇

2

1

20

8

2.0K

A

Albert Tseng@tsengalb99 · Jun 6

VideoPrism is now available at: github.com/google-deepmin… :)

GGoogle AI@GoogleAI · Mar 25, 2024

Introducing VideoPrism, a single model for general-purpose video understanding that can handle a wide range of tasks, including classification, localization, retrieval, captioning and question answering. Learn how it works at goo.gle/49ltEXW

1

4

18

5

3.0K

Albert Tseng Retweeted

T

Together AI@togethercompute · Jun 5

5/ Quantized models don't need to lose fidelity. Check out our paper and blog for details: 📝 Paper: arxiv.org/abs/2505.22988 📖 Blog: together.ai/blog/yaqa 💻 Code: github.com/Cornell-RelaxM…

0

2

3

0

1.0K

Albert Tseng Retweeted

A

Austin Silveria@austinsilveria · Jun 5

chipmunk is up on arxiv! across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention and MLPs account for 70-90% of the change in activations across steps caching + sparsity speeds up generation by only recomputing fast changing activations

1

7

20

6

7.0K

Albert Tseng Retweeted

T

Together AI@togethercompute · Jun 5

🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭) Led by @tsengalb99, YAQA minimizes the KL divergence to the original model during quantization, cutting it by >30% vs. prior methods and outperforming even QAT on Gemma 3. 👇

4

5

27

12

2.0K

Albert Tseng Retweeted

t

turboderp@turboderp_ · Apr 6

I made a thing. github.com/turboderp-org/…

16

25

228

66

13.0K

Albert Tseng Retweeted

Y

Yucheng Lu@_yucheng_lu · May 8

🔥Thrilled to share that I’ll be joining the Computer Science Department at NYU Shanghai as an Assistant Professor starting Fall 2025! @nyushanghai 🎯 I’ll be recruiting PhD students across the entire NYU network—including @nyushanghai, @nyutandon, and @NYU_Courant—to build…

7

5

365

39

23.0K