Irina Rish
@irinarish
prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head http://www.irina-lab.ai; INCITE project PI http://tinyurl.com/yc3jzudt; CSO http://nolano.ai
Can one achieve SOTA LLM performance at a much lower bitsize (=>memory/inference costs) than current (post-training) quantization? YES! - by training ternary LLMs - a "sweet spot" between underperforming binary and costly ful-precision ones. Happy to announce our recently…
🚀 SpectraSuite of Ternary and FP16 LLMs 🚀 We’re thrilled to release the Spectra Suite of open ternary (TriLMs) and FP16 (FloatLMs) language models from 99M to 3.9B parameters. At billion+ parameter scale, TriLMs upto 10x smaller can match the performance of FloatLMs. 1/5
Awsome! Me and my colleagues @Tommaso_Tosato @introspection Guillermo Cecchi's "comp psych" team at IBM were also looking into LLM psychiatry for some time (neurips.cc/virtual/2024/1… etc), it's a truly fascinating topic, indeed!
In a joint paper with @OwainEvans_UK as part of the Anthropic Fellows Program, we study a surprising phenomenon: subliminal learning. Language models can transmit their traits to other models, even in what appears to be meaningless data. x.com/OwainEvans_UK/…
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Truly exciting achievements - current frontier AI models would be probably considered AGI 10 years ago, but AI goalposts always keep moving, and critics always downplay the achievements and emphasize imperfections (same old, same old :)
We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇
We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇
It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI. Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
A very interesting recent work on distributed Muon (Dion): share.google/lfZ46PQPSXmRIC…
Come see our poster and talk if you are at ICML!
If you're at ICML, come tomorrow (Tuesday) to Oscar's talk, where he will present our paper "Layer by layer: Uncovering hidden representations in language models" at 10am (West Ballroom D) and for the poster session at 11am (East Exhibition Hall A-B #E-2607).
Check out our work arxiv.org/abs/2503.02844 on advantages of using infinite LR for continual pretraining of foundation models (July 19, ES-FOMO workshop)! Many thanks to amazing coauthors -Vaibhav Singh, @janson002 @PMehrbod3864 @ai_phd @ebelilov and @benjamintherien!
🗓️ July 19 (ES-FOMO): "Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training" - Using Infinite LR to reduce forgetting in continual pretraining of vision (MAE) and language (LLM) foundation models. 📄 arxiv.org/abs/2503.02844
🗓️ July 19 (ES-FOMO): "Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training" - Using Infinite LR to reduce forgetting in continual pretraining of vision (MAE) and language (LLM) foundation models. 📄 arxiv.org/abs/2503.02844
Thanks to my collaborators and mentors @KateLobacheva @irinarish, Supriyo Chakraborty, and Nima Chitsazan. Also to @PandaAshwinee for coining "zero-sum learning", which is honestly a pretty great name.
LAION proudly presents 2 state-of-the-art emotion detection models for voice and face, surpassing Gemini 2.5 Pro and Hume API. They are completely open under a CC BY 4.0 license, alongside a ~5,000-hour voice-acting dataset & 2 expert-annotated benchmarks. laion.ai/blog/do-they-s…
MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from @benjamintherien, Xiaolong Huang, @irinarish, @ebelilov * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback
Huh. Looks like Plato was right. A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text. Implications for philosophy and vector databases alike.
🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE! After a year in stealth w/@AnnemarieWolff, our benchmarks show #quantum speedups for brain data simulation & analysis using @qiskit + @IBM QS1 —> Next: #OpenSource tools & intl. collab 🇯🇵🔄🇨🇦
You can already check our recent works on this topic: - LLMs and Personalities: Inconsistencies Across Scales openreview.net/forum?id=vBg3O… - Lost in Translation: The Algorithmic Gap Between LMs and the Brain arxiv.org/abs/2407.04680
Grateful for the @IVADO_Qc Exploratory Grant with @IrinaRish & @Tommaso_Tosato on how #LLMs express personality traits & socio-emotional responses—toward safer #AI in Health & Education ivado.ca/en/2025/04/09/…
MatFormers are very powerful alternatives to transformers. Similar to a regular transformer, but after training, you can split up the model to any size you like and get very strong performance that scales just like a regular transformer. So train once, get models of all sizes!
Pocket powerhouse admist I/O awesomeness! Gemma 3n E4B & E2B are insane models, optimized for on-device while rivaling frontier models. It's a 🪆Matryoshka Transformer (MatFormer)🪆: Natively elastic b/w 4B & 2B pareto-optimally! ⭐️: free models with ZERO training cost! 🧵👇
Quite impressive!
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
totally! I love vibe-coding, the efficiency is unreal
The hottest new programming language is English