Divyat Mahajan (@divyat09)

Pinned

D

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

divyat09's tweet image. Happy to share that Compositional Risk Minimization has been accepted at #ICML2025

📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions!

📜 arxiv.org/abs/2410.06303

5

32

164

66

17.0K

D

Divyat Mahajan@divyat09 · 23 h

Llama Nemotron model just got Super-Charged ⚡️We released Llama-Nemotron-Super-v1.5 today! The best open model that can be deployed on a single H100 🚀 Enhanced for reasoning, tool use, general chat, and instruction following. HF : huggingface.co/nvidia/Llama-3…

OOleksii Kuchaiev@kuchaev · 23 h

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

2

7

31

0

2.0K

Divyat Mahajan Retweeted

R

Ryan D'Orazio@RyanDOrazio · Jul 20

I’m also excited to be presenting this work (openreview.net/forum?id=4ZX2a…) at ICCOPT at USC. Theory aside there are some applications that may interest ppl in RL, games, and performative prediction. Let me know if you are in the area and want to chat!

0

3

8

0

569

Divyat Mahajan Retweeted

P

Puneesh Deora@puneeshdeora · Jun 25

🚨 New paper drop! 🚨 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick? It chooses the simplest sufficient one! 🧵👇

5

52

352

354

33.0K

D

Divyat Mahajan@divyat09 · Jul 18

As the field moves towards agents doing science, the ability to understand novel environments through interaction becomes critical. AutumnBench is an attempt at measuring this abstract capability in both humans and current LLMs. Check out the blog post for more insights!

BBasis@BasisOrg · Jul 17

We’re proud to announce the launch of AutumnBench, an open-source benchmark developed on our Autumn platform. This benchmark, led by our MARA team, provides a novel platform for evaluating world modeling and causal reasoning in both human and artificial intelligence.

1

7

20

3

1.0K

D

Divyat Mahajan@divyat09 · Jul 15

Thrilled to share that our work received the Outstanding Paper Award at ICML! I will be giving the oral presentation on Tuesday at 4:15 PM. @Jaeyeon_Kim_0 and I both will be at the poster session shortly after the oral presentation. Please attend if possible!

SSitan Chen@sitanch · Feb 11

Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/

5

21

146

38

14.0K

D

Divyat Mahajan@divyat09 · Jul 15

congrats on the award!! great to see more work designing insightful tasks that bring out the role of token ordering & difficulty (should i say "indecipherability" ;-) ). I think the idea of learning token-level subproblems is broken both for diffusion (and next-token learning)

SSitan Chen@sitanch · Feb 11

Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/

1

2

5

2

853

D

Divyat Mahajan@divyat09 · Jul 18

“Apple looses key AI leaders to Meta” I discovered this while doing the live demo of Reka Research 😂 Go watch the video and play with our agent

RReka@RekaAILabs · Jul 18

Reka Research is our AI agent that scours the web to answer your toughest questions. Ready to unlock its full potential? Learn directly from the team who built it!

0

3

27

2

2.0K

D

Divyat Mahajan@divyat09 · Jul 17

I will be at the Actionable Interpretability Workshop (@ActInterp, #ICML) presenting *SSAEs* in the East Ballroom A from 1-2pm. Drop by (or send a DM) to chat about (actionable) interpretability, (actionable) identifiability, and everything in between!

SShruti Joshi@_shruti_joshi_ · Feb 21

1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

1

6

24

0

2.0K

D

Divyat Mahajan@divyat09 · Jul 16

Excited to present our work "Improving the scaling laws of synthetic data with deliberate practice", tomorrow at #ICML2025 📢 Oral: Wed. 10:45 AM 📍 West Ballroom B (Oral 3C Data-Centric ML) 🖼️ Poster: 🕚 11:00 AM – 1:30 PM 📍 East Exhibition Hall A-B (Poster Session 3 East)

RReyhane Askari@ReyhaneAskari · Feb 28

🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥…

0

9

37

7

4.0K

D

Divyat Mahajan@divyat09 · Jul 16

Today at #ICML2025, we present Deliberate Practice: an approach to improve sample-efficiency by generating harder, not more, examples. - Oral talk at 10:45 - West Ballroom B | Orals 3C: Data-Centric ML Join us to discuss principled approaches to more efficient learning.

RReyhane Askari@ReyhaneAskari · Jul 16

Excited to present our work "Improving the scaling laws of synthetic data with deliberate practice", tomorrow at #ICML2025 📢 Oral: Wed. 10:45 AM 📍 West Ballroom B (Oral 3C Data-Centric ML) 🖼️ Poster: 🕚 11:00 AM – 1:30 PM 📍 East Exhibition Hall A-B (Poster Session 3 East)

0

6

17

5

2.0K

D

Divyat Mahajan@divyat09 · Jul 16

Check out our new work on learning diffusion models with guidance from pretrained vision and language embeddings. Also, a contributed talk at #ICML2025 FM4LS workshop this Saturday! 💡Results in 23x speedup compared to SiT-XL on the class-conditional ImageNet 256×256 benchmark

CChenyu (Monica) Wang@ChenyuW64562111 · Jul 15

Excited to share: “Learning Diffusion Models with Flexible Representation Guidance” With my amazing coauthors @zhuci19, @sharut_gupta, @zy27962986, @StefanieJegelka, @stats_stephen, Tommi Jaakkola Paper: arxiv.org/pdf/2507.08980 Code: github.com/ChenyuWang-Mon…

2

8

51

14

3.0K

D

Divyat Mahajan@divyat09 · Jul 14

This work delivers on both theory and practice—offering the sharpest provable compositionality guarantees I know of, alongside state‑of‑the‑art performance on tough compositional distribution‑shift benchmarks.

DDivyat Mahajan@divyat09 · Jul 14

Presenting CRM at #ICML2025 📌 Wednesday, 16th July, 11 am 📍East Exhibition Hall A-B (E-2101) Lets chat about distribution shifts! Been deep into causality & invariance based perspectives, and recently exploring robust LLM pretraining architectures.

0

4

24

1

2.0K

D

Divyat Mahajan@divyat09 · Jul 14

Presenting CRM at #ICML2025 📌 Wednesday, 16th July, 11 am 📍East Exhibition Hall A-B (E-2101) Lets chat about distribution shifts! Been deep into causality & invariance based perspectives, and recently exploring robust LLM pretraining architectures.

DDivyat Mahajan@divyat09 · May 1

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

0

9

49

7

4.0K

Divyat Mahajan Retweeted

A

Arthur Gretton@ArthurGretton · Jul 14

Distributional diffusion models with scoring rules at #icml25 Fewer, larger denoising steps using distributional losses! Wednesday 11am poster E-1910 arxiv.org/pdf/2502.02483 @agalashov @ValentinDeBort1 Guntupalli @zhouguangyao @sirbayes @ArnaudDoucet1

5

58

399

305

41.0K

Divyat Mahajan Retweeted

V

Vineet Jain@thevineetjain · Jul 1

How to align your diffusion model with unseen objectives at inference time? Presenting Diffusion Tree Sampling/Search (DTS/DTS*) 🥳 Using MCTS-style search, DTS steadily improves sample quality with compute, matching the best baseline with 5× less compute!

3

26

154

105

11.0K

Divyat Mahajan Retweeted

S

Sébastien Lachapelle@seblachap · Jun 27

My thesis is now online! umontreal.scholaris.ca/items/f8670d1c… This is more than just a list of publications. I invested a lot of time and passion writing this thesis in hope that it will make for an interesting read. Here's a summary of what you'll find in it.

6

16

114

44

7.0K

D

Divyat Mahajan@divyat09 · Jun 26

I'm delighted to share that our paper has been accepted by #TMLR! We empirically observed signs of scaling laws regarding how the choice of pre-trained models affects OOD test errors and Expected Calibration Error on downstream tasks.

AAccepted papers at TMLR@TmlrPub · Jun 25

An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Cali... Hiroki Naganuma, Ryuichiro Hataya, Kotaro Yoshida, Ioannis Mitliagkas. Action editor: Mingsheng Long. openreview.net/forum?id=tYjoH… #accuracy #trained #deep

0

5

43

3

5.0K

Divyat Mahajan Retweeted

P

Polina Kirichenko@polkirichenko · Jun 16

Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer! Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate! Details and links to paper & open source code below! 🧵1/9

11

81

593

412

123.0K

Divyat Mahajan Retweeted

M

Mark Ibrahim@marksibrahim · Jun 16

A good language model should say “I don’t know” by reasoning about the limits of its knowledge. Our new work AbstentionBench carefully measures this overlooked skill in leading models in an open-codebase others can build on! We find frontier reasoning degrades models’ ability to…

3

18

111

85

11.0K

D

Divyat Mahajan@divyat09 · Jun 14

Excited to share our work on Transformer-PSMs: a neural sequence model with constant per-token inference time and log(seq-len) memory. It presents a sweet spot between transformers (linear scaling with KV cache) and RNNs/state space models (constant). Check the thread below 👇

MMorris Yau@MorrisYau · Jun 13

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

1

4

67

27

3.0K