Maximilian Beck

@maxmbeck

ELLIS PhD Student @ JKU Linz Institute for Machine Learning & PhD Researcher @nx_ai_com, Research Scientist Intern @Meta FAIR

Linz, Österreich

Joined June 2021

734Following

887Followers

Pinned

Maximilian Beck@maxmbeck · Mar 19

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

maxmbeck's tweet image. Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧

We introduce

⚡️Tiled Flash Linear Attention (TFLA), ⚡️

A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating.

We find TFLA is really fast!

🧵(1/11)

345

211

45.0K

Maximilian Beck Retweeted

Sepp Hochreiter@HochreiterSepp · Jun 20

NXAI has successfully demonstrated that their groundbreaking xLSTM (Long Short Term Memory) architecture achieves exceptional performance on AMD Instinct™ GPUs - significant advancement in RNN technology for edge computing applications. amd.com/en/blogs/2025/…

131

7.0K

Maximilian Beck Retweeted

Korbinian Poeppel@KorbiPoeppel · Jun 19

Ever wondered how 'Composition over Inheritance' can be used efficiently in ML Experiment Configuration (and beyond)? Check out the CompoConf library: Enabling type-safe compositional configuration in Python. korbi.ai/blog/compoconf Based on ideas by @maxmbeck (and a bit of mine)

246

Maximilian Beck@maxmbeck · Jun 17

MesaNet is beautiful! A great paper with extensive benchmark of recent RNNs (including xLSTM) on synthetic tasks and language modeling

JJohannes Oswald@oswaldjoh · Jun 17

Super happy and proud to share our novel scalable RNN model - the MesaNet! This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.

840

Maximilian Beck Retweeted

Korbinian Poeppel@KorbiPoeppel · Jun 16

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997

136

14.0K

Maximilian Beck Retweeted

rohan anil@_arohan_ · Jun 5

132

1.0K

92.0K

Maximilian Beck Retweeted

Sepp Hochreiter@HochreiterSepp · Jun 4

Mein Buch “Was kann Künstliche Intelligenz?“ ist erschienen. Eine leicht zugängliche Einführung in das Thema Künstliche Intelligenz. LeserInnen – auch ohne technischen Hintergrund – wird erklärt, was KI eigentlich ist, welche Potenziale sie birgt und welche Auswirkungen sie hat.

1.0K

Maximilian Beck Retweeted

Andreas Auer@AndAuer · Jun 2

We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.

14.0K

Maximilian Beck Retweeted

Songlin Yang@SonglinYang4 · May 24

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

531

329

72.0K

Maximilian Beck@maxmbeck · May 10

Excited to share that 2 of our papers on efficient inference with #xLSTM are accepted at #ICML25. A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks (arxiv.org/abs/2410.22391) and xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference:

MMaximilian Beck@maxmbeck · Mar 18

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

5.0K

Maximilian Beck@maxmbeck · Apr 28

Come by today at our posters in the Open Science for Foundation Models at 3pm (Hall4#5) #ICLR25 if you want to know more about Tiled Flash Linear Attention and xLSTM 7B!

maxmbeck's tweet image. Come by today at our posters in the Open Science for Foundation Models at 3pm (Hall4#5) #ICLR25 if you want to know more about Tiled Flash Linear Attention and xLSTM 7B!

3.0K

Maximilian Beck Retweeted

Sepp Hochreiter@HochreiterSepp · Apr 24

xLSTM for Multi-label ECG Classification: arxiv.org/abs/2504.16101 "This approach significantly improves ECG classification accuracy, thereby advancing clinical diagnostics and patient care." Cool.

4.0K

Maximilian Beck Retweeted

Korbinian Poeppel@KorbiPoeppel · Apr 22

Hope to see you around at #ICLR2025 in #Singapore! I'm happy to present our work on xLSTM kernels, applications and scaling up to 7B parameters!

440

Maximilian Beck@maxmbeck · Apr 10

I will talk about our xLSTM 7B, today! Tune in 💫

CCecile Tamura@ceciletamura · Apr 9

🚀 Join us for an exclusive discussion on xLSTM 7B! To the future of fast and efficient LLMs w/ Maximillian Beck, PhD researcher at Johannes Kepler University & protégé of Mr. LSTM himself, Sepp Hochreiter. Hosted by @ceciletamura of @ploutosai app.ploutos.dev/streams/noctur…

725

Maximilian Beck@maxmbeck · Apr 8

Does SSMax in Llama4 avoid attention sinks?

FFederico Barbero@fedzbar · Apr 8

Great question! I imagine that temperature scaling should actually make sinks stronger (as it should help sharpen attention patterns over long context) -- although we have not checked yet. Worth noting that we proposed something similar to SSMax here arxiv.org/abs/2410.01104

291

Maximilian Beck Retweeted

Julien Siems@julien_siems · Mar 28

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

186

156

34.0K