Loading profile...

Max Zhdanov's banner

M

Max Zhdanov

@maxxxzdn

busy scaling on two GPUs at @amlabuva with @wellingmax and @jwvdm

Amsterdam, Netherlands

Joined March 2022

370Following

2KFollowers

Pinned

M

Max Zhdanov@maxxxzdn · Jun 20

🤹 New blog post! I write about our recent work on using hierarchical trees to enable sparse attention over irregular data (point clouds, meshes) - Erwin Transformer. blog: maxxxzdn.github.io/blog/erwin/ paper: arxiv.org/abs/2502.17019 Compressed version in the thread below:

maxxxzdn's tweet image. 🤹 New blog post!

I write about our recent work on using hierarchical trees to enable sparse attention over irregular data (point clouds, meshes) - Erwin Transformer.

blog: maxxxzdn.github.io/blog/erwin/
paper: arxiv.org/abs/2502.17019

Compressed version in the thread below:

7

89

519

474

39.0K

Max Zhdanov Retweeted

A

Andy Keller@t_andy_keller · Jul 22

Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵

6

73

386

253

24.0K

Max Zhdanov Retweeted

S

Sebastian Raschka@rasbt · Jul 19

From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... magazine.sebastianraschka.com/p/the-big-llm-…

29

387

2.0K

2.0K

137.0K

Max Zhdanov Retweeted

G

Grigory Bartosh (🇨🇦 @ICML2025)@GrigoryBartosh · Jul 16

📢Presenting SDE Matching🔥🔥🔥 🚀We extend diffusion models to construct a simulation-free framework for training Latent SDEs. It enables sampling from the exact posterior process marginals without any numerical simulations. 📜: arxiv.org/abs/2502.02472 🧵1/8

3

141

813

763

77.0K

Max Zhdanov Retweeted

J

Johannes Brandstetter@jo_brandstetter · Jul 17

General relativity 🤝 neural fields This simulation of a black hole is coming from our neural networks 🚀 We introduce Einstein Fields, a compact NN representation for 4D numerical relativity. EinFields are designed to handle the tensorial properties of GR and its derivatives.

9

71

311

122

33.0K

M

Max Zhdanov@maxxxzdn · Jul 16

working on sub-quadratic architectures feels similar to geometric deep learning - very intellectually stimulating but your research is constantly haunted by the scale

1

0

11

1

486

Max Zhdanov Retweeted

M

Michael Bronstein@mmbronstein · Jul 15

We are presenting multiple papers at #ICML2025 -- come and meet the team in the next days!

2

7

87

11

6.0K

M

Max Zhdanov@maxxxzdn · Jul 15

Happens today! 🗓️Tue, July 15 @ 11 AM 📍East Exhibition Hall A-B #E-3512 Unfortunately I was not able to attend, so please DM if you want to chat about hierarchical models, irregular geometries or scalable physical modeling :) @FEijkelboom will present the poster for me on-site

MMax Zhdanov@maxxxzdn · Jun 20

🤹 New blog post! I write about our recent work on using hierarchical trees to enable sparse attention over irregular data (point clouds, meshes) - Erwin Transformer. blog: maxxxzdn.github.io/blog/erwin/ paper: arxiv.org/abs/2502.17019 Compressed version in the thread below:

2

13

41

14

6.0K

Max Zhdanov Retweeted

F

Floor Eijkelboom@FEijkelboom · Jul 9

Flow Matching (FM) is one of the hottest ideas in generative AI - and it’s everywhere at #ICML2025. But what is it? And why is it so elegant? 🤔 This thread is an animated, intuitive intro into (Variational) Flow Matching - no dense math required. Let's dive in! 🧵👇

110

255

2.0K

2.0K

239.0K

Max Zhdanov Retweeted

T

Tung Nguyen@tungnd_13 · Jul 8

🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.

23

259

2.0K

1.0K

139.0K

Max Zhdanov Retweeted

A

Alexi Gladstone@AlexiGlad · Jul 7

How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…

43

248

2.0K

2.0K

300.0K

Max Zhdanov Retweeted

T

Thuerey Group at TUM@thuereyGroup · Jun 30

Get ready for the PDE-Transformer: our new NN architecture tailored to scientific tasks. It combines hierarchical processing (UDiT), scalability (SWin) and flexible conditioning mechanisms. The paper tum-pbs.github.io/pde-transforme… shows it outperforming existing SOTA architectures 😁

3

38

241

153

13.0K

Max Zhdanov Retweeted

J

Johannes Brandstetter@jo_brandstetter · Jun 30

We release AB-UPT, a novel method to scale neural surrogates to CFD meshes beyond 100 million of mesh cells. AB-UPT is extensively tested on the largest publicly available datasets. 📄 arxiv.org/abs/2502.09692 🤗 huggingface.co/EmmiAI/AB-UPT 💻 github.com/Emmi-AI/AB-UPT

1

18

69

15

4.0K

Max Zhdanov Retweeted

T

Tilde@tilderesearch · Jun 25

Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover: - Novel attention patterns - Hidden "attention sinks" - Better performance - And more A 🧵… ~1/8~

5

79

406

383

56.0K

Max Zhdanov Retweeted

K

Kirill Neklyudov@k_neklyudov · Jun 24

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight @genbio_workshop of @icmlconf 2025! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"…

5

55

301

257

43.0K

M

Max Zhdanov@maxxxzdn · Jun 22

In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance

ssamsja@samsja19 · Jun 19

Oh I am a big fan of self supervised learning. Also ssl has never been benchmark maxing on imagenet afaik. I am mainly complaining about the supervised classification imagenet hill climb

8

12

199

71

26.0K

Max Zhdanov (@maxxxzdn) - Twitter Viewer