Sai Surya Duvvuri

@dvsaisurya

Visiting Researcher at FAIR, Meta and CS PhD student at UT Austin. Previously, SR at Google | Pre-Doctoral Research Fellow at MSR India | CS UG at IIT KGP

Austin, Texas

Joined November 2011

314Following

478Followers

Pinned

Sai Surya Duvvuri@dvsaisurya · Jul 17

📢 Thrilled to share our new paper, LASER: Attention with Exponential Transformation, accepted at ICML2025, work done at Google. Come by our poster presentation! 🗓️ Thurs, July 17th, 4:30-7pm 📍 West Exhibition Hall B2-B3, # W-915 Read the full paper here: arxiv.org/abs/2411.03493

5.0K

Pinned

Sai Surya Duvvuri Retweeted

Kol Tregaskes@koltregaskes · Jul 6

New AI model tweak could change how Transformers read text. Instead of just comparing pairs of words, this new approach looks at triples—capturing more context from each token. Here’s how triplet attention, called 2‑simplicial, could make models smarter while keeping data costs…

2.0K

Sai Surya Duvvuri@dvsaisurya · Jul 20

top-k greedy inference for diffusion models can unlock better accuracies. Wondering if finding the optimal order to unmask the tokens can be automated across prompts/tasks.

SSitan Chen@sitanch · Feb 11

Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/

467

Sai Surya Duvvuri@dvsaisurya · Jul 15

Thrilled to share that our work received the Outstanding Paper Award at ICML! I will be giving the oral presentation on Tuesday at 4:15 PM. @Jaeyeon_Kim_0 and I both will be at the poster session shortly after the oral presentation. Please attend if possible!

SSitan Chen@sitanch · Feb 11

146

14.0K

Sai Surya Duvvuri Retweeted

Kempner Institute at Harvard University@KempnerInst · Jul 15

A team from #KempnerInstitute, @hseas & @UTCompSci has won a best paper award at #ICML2025 for work unlocking the potential of masked diffusion models. Congrats to @Jaeyeon_Kim_0, @shahkulin98, Vasilis Kontonis, @ShamKakade6 and @sitanch. kempnerinstitute.harvard.edu/news/kempner-i… #AI

15.0K

Sai Surya Duvvuri@dvsaisurya · Jul 17

If you liked CASPR, you will like LASER Attention! Check it out

SSai Surya Duvvuri@dvsaisurya · Jul 17

3.0K

Sai Surya Duvvuri@dvsaisurya · Jul 12

MuonClip... so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe, z-loss ?)

SSimo Ryu@cloneofsimo · Dec 5

Very interesting, standard attention causes vanishing gradient due to most prob being very small after some training. LASER tackles this by pushing the attention operation on exponential space. i.e., exp_output = sm(QK^T) exp(V) They dont seem to exaggerate on the performance…

300

197

28.0K

Sai Surya Duvvuri@dvsaisurya · Jul 11

Sensitivity and Sharpness of n-Simplical Attention On the topic of stabilizing training, I got unreasonably nerdsniped by the 2-simplical attention and ended up deriving the sensitivity and sharpness bounds of n-simplical attention more generally...

lleloy!@leloykun · Jul 11

I want you all to read @Kimi_Moonshot's technical report on K2 then go back to this thread awesome work by @Jianlin_S and team! x.com/Yuchenj_UW/sta…

136

14.0K

Sai Surya Duvvuri Retweeted

TuringPost@TheTuringPost · Jul 8

As we're running out of high-quality training data, changing models' architecture is an essential solution. 2-simplicial Transformer - @AIatMeta's new type of Transformer with special attention mechanism that: ➡️ Compares triplets of tokens (not pairs) to capture richer…

8.0K

Sai Surya Duvvuri Retweeted

The AI Timeline@TheAITimeline · Jul 6

This week's top AI/ML research papers: - 2 Simplicial Attention - UMA - Transition Matching - GLM-4.1V-Thinking - The Trilemma of Truth in LLMs - Do Vision-Language Models Have Internal World Models? - The Automated LLM Speedrunning Benchmark - RoboScape - Test-Time Scaling with…

572

362

42.0K

Sai Surya Duvvuri Retweeted

Wes Roth@WesRothMoney · Jul 6

Meta researchers just dropped a new twist on Transformers—“2-simplicial attention”—and the early results are wild. Instead of classic dot-product pairs, the model uses trilinear functions (think attention over 3-way interactions) via an optimized Triton kernel. The payoff?…

209

14.0K

Sai Surya Duvvuri Retweeted

rohan anil@_arohan_ · Jul 6

An explanation of Match3 functions and motivation for 2-simplicial attention. You have a shelf of about 30 film which include: Lord of the Rings, Independence Day, Harry Potter series, Idiocracy, Mission: Impossible, The Social Network, Die Hard series, and so on. Now say you…

109

21.0K

Sai Surya Duvvuri@dvsaisurya · Jul 4

You just know they wanted to call this "2 Fast 2-Simplicial" so bad.

AAurko Roy@aurko79 · Jul 4

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales…

116

10.0K

Sai Surya Duvvuri@dvsaisurya · Jul 4

many many many such cases lots of alpha in scaling up simple 2018-2021 ideas that didn’t win the academia attention game

SSimo Ryu@cloneofsimo · Jul 4

Wait what this paper has 4 citations and these guys decided to scale this to billion parameter scale with efficient triton implementation? Incredible. Huge respect...

496

181

54.0K