Günter Klambauer
@gklambauer
Deep Learning researcher known for self-normalizing neural networks, applications of Machine Learning in Life Sciences areas; ELLIS program Director
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: arxiv.org/abs/2411.04165

Since 1990, we have worked on artificial curiosity & measuring „interestingness.“ Our new ICML paper uses "Prediction of Hidden Units" loss to quantify in-context computational complexity in sequence models. It can tell boring from interesting tasks and predict correct reasoning.
Excited to share our new ICML paper, with co-authors @robert_csordas and @SchmidhuberAI! How can we tell if an LLM is actually "thinking" versus just spitting out memorized or trivial text? Can we detect when a model is doing anything interesting? (Thread below👇)
Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
xLSTM for Aspect-based Sentiment Analysis: arxiv.org/abs/2507.01213 Another success story of xLSTM. MEGA: xLSTM with Multihead Exponential Gated Fusion. Experiments on 3 benchmarks show that MEGA outperforms state-of-the-art baselines with superior accuracy and efficiency”
10 years ago, in May 2015, we published the first working very deep gradient-based feedforward neural networks (FNNs) with hundreds of layers (previous FNNs had a maximum of a few dozen layers). To overcome the vanishing gradient problem, our Highway Networks used the residual…
xLSTM for multivariate time series anomaly detection: arxiv.org/abs/2506.22837 “In our results, xLSTM showcases state-of-the-art accuracy, outperforming 23 popular anomaly detection baselines.” Again, xLSTM excels in time series analysis.
Great application but built on the wrong model architecture... We've already shown that Transformer is inferior to xLSTM on DNA: arxiv.org/abs/2411.04165
Happy to introduce AlphaGenome, @GoogleDeepMind's new AI model for genomics. AlphaGenome offers a comprehensive view of the human non-coding genome by predicting the impact of DNA variations. It will deepen our understanding of disease biology and open new avenues of research.
Happy to introduce AlphaGenome, @GoogleDeepMind's new AI model for genomics. AlphaGenome offers a comprehensive view of the human non-coding genome by predicting the impact of DNA variations. It will deepen our understanding of disease biology and open new avenues of research.
Really cool new work with amazing students and collaborators.
[1/9]🚀Excited to share our new work, RNE! A plug-and-play framework for everything about diffusion model density and control: density estimation, inference-time control & scaling, energy regularisation. More details👇 Joint work with @jmhernandez233 @YuanqiD, Francisco Vargas
NXAI has successfully demonstrated that their groundbreaking xLSTM (Long Short Term Memory) architecture achieves exceptional performance on AMD Instinct™ GPUs - significant advancement in RNN technology for edge computing applications. amd.com/en/blogs/2025/…
🚀 After two+ years of intense research, we’re thrilled to introduce Skala — a scalable deep learning density functional that hits chemical accuracy on atomization energies and matches hybrid-level accuracy on main group chemistry — all at the cost of semi-local DFT. ⚛️🔥🧪🧬
Chemical accuracy with Deep Learning based DFT - #compchem
🚀 After two+ years of intense research, we’re thrilled to introduce Skala — a scalable deep learning density functional that hits chemical accuracy on atomization energies and matches hybrid-level accuracy on main group chemistry — all at the cost of semi-local DFT. ⚛️🔥🧪🧬
Parallelizable and state tracking and learnable information flow. Wowww. Super Work of Korbinian and team.
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997
A European-developed TiRex is leading the field—significantly ahead of U.S. competitors like Amazon, Datadog, Salesforce, and Google, as well as Chinese models from companies such as Alibaba.
GIFT-Eval Time Series Forecasting Leaderboard Evaluates time-series forecasting methods. Now leading: TiREX arxiv.org/abs/2505.23719 Link: huggingface.co/spaces/Salesfo…
GIFT-Eval Time Series Forecasting Leaderboard Evaluates time-series forecasting methods. Now leading: TiREX arxiv.org/abs/2505.23719 Link: huggingface.co/spaces/Salesfo…

Europe is winning the AI race!! Best foundation model for time-series!
We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.
MHNfs: Prompting In-Context Bioactivity Predictions for Low-Data Drug Discovery #DrugDiscovery pubs.acs.org/doi/10.1021/ac… @JSchimunek @sohvi_luukkonen @gklambauer @LITAILab #JCIM Vol65 Issue9 #ApplicationNote
Finally out!!!
MHNfs: Prompting In-Context Bioactivity Predictions for Low-Data Drug Discovery #DrugDiscovery pubs.acs.org/doi/10.1021/ac… @JSchimunek @sohvi_luukkonen @gklambauer @LITAILab #JCIM Vol65 Issue9 #ApplicationNote
Indeed.. he gives some thoughts on that in the book..
Just pre-ordered Hochreiter's book. His LSTM work changed AI - keen to see where he thinks it's headed next.