Irina Rish

@irinarish

prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head http://www.irina-lab.ai; INCITE project PI http://tinyurl.com/yc3jzudt; CSO http://nolano.ai

Montreal, QC

Joined October 2011

996Following

10KFollowers

Pinned

Irina Rish@irinarish · Aug 1

Can one achieve SOTA LLM performance at a much lower bitsize (=>memory/inference costs) than current (post-training) quantization? YES! - by training ternary LLMs - a "sweet spot" between underperforming binary and costly ful-precision ones. Happy to announce our recently…

NNolano.ai@NolanoOrg · Jul 18, 2024

🚀 SpectraSuite of Ternary and FP16 LLMs 🚀 We’re thrilled to release the Spectra Suite of open ternary (TriLMs) and FP16 (FloatLMs) language models from 99M to 3.9B parameters. At billion+ parameter scale, TriLMs upto 10x smaller can match the performance of FloatLMs. 1/5

19.0K

Irina Rish Retweeted

Irina Rish@irinarish · Jul 23

Awsome! Me and my colleagues @Tommaso_Tosato @introspection Guillermo Cecchi's "comp psych" team at IBM were also looking into LLM psychiatry for some time (neurips.cc/virtual/2024/1… etc), it's a truly fascinating topic, indeed!

1.0K

Irina Rish@irinarish · Jul 22

In a joint paper with @OwainEvans_UK as part of the Anthropic Fellows Program, we study a surprising phenomenon: subliminal learning. Language models can transmit their traits to other models, even in what appears to be meaningless data. x.com/OwainEvans_UK/…

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

168

1.0K

490

193.0K

Irina Rish@irinarish · Jul 21

Truly exciting achievements - current frontier AI models would be probably considered AGI 10 years ago, but AI goalposts always keep moving, and critics always downplay the achievements and emphasize imperfections (same old, same old :)

AAnkesh Anand@ankesh_anand · Jul 21

We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇

7.0K

Irina Rish Retweeted

Ankesh Anand@ankesh_anand · Jul 21

We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇

467

35.0K

Irina Rish@irinarish · Jul 19

It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI. Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative…

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

165

1.0K

397

257.0K

Irina Rish@irinarish · Jul 16

A very interesting recent work on distributed Muon (Dion): share.google/lfZ46PQPSXmRIC…

495

Irina Rish@irinarish · Jul 15

Come see our poster and talk if you are at ICML!

RRavid Shwartz Ziv@ziv_ravid · Jul 14

If you're at ICML, come tomorrow (Tuesday) to Oscar's talk, where he will present our paper "Layer by layer: Uncovering hidden representations in language models" at 10am (West Ballroom D) and for the poster session at 11am (East Exhibition Hall A-B #E-2607).

672

Irina Rish@irinarish · Jul 14

Check out our work arxiv.org/abs/2503.02844 on advantages of using infinite LR for continual pretraining of foundation models (July 19, ES-FOMO workshop)! Many thanks to amazing coauthors -Vaibhav Singh, @janson002 @PMehrbod3864 @ai_phd @ebelilov and @benjamintherien!

PPaul Janson @ICML 🇨🇦@janson002 · Jul 13

🗓️ July 19 (ES-FOMO): "Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training" - Using Infinite LR to reduce forgetting in continual pretraining of vision (MAE) and language (LLM) foundation models. 📄 arxiv.org/abs/2503.02844

2.0K

Irina Rish Retweeted

Paul Janson @ICML 🇨🇦@janson002 · Jul 13

2.0K

Irina Rish Retweeted

Andrei Mircea@mirandrom · Jul 12

Thanks to my collaborators and mentors @KateLobacheva @irinarish, Supriyo Chakraborty, and Nima Chitsazan. Also to @PandaAshwinee for coining "zero-sum learning", which is honestly a pretty great name.

641

Irina Rish Retweeted

LAION@laion_ai · Jun 19

LAION proudly presents 2 state-of-the-art emotion detection models for voice and face, surpassing Gemini 2.5 Pro and Hume API. They are completely open under a CC BY 4.0 license, alongside a ~5,000-hour voice-acting dataset & 2 expert-annotated benchmarks. laion.ai/blog/do-they-s…

252

104

38.0K

Irina Rish Retweeted

Ethan Caballero is busy@ethanCaballero · Jun 1

🤨 chatgpt.com/share/683ce589…

1.0K

Irina Rish Retweeted

Arthur Douillard@Ar_Douillard · May 30

MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from @benjamintherien, Xiaolong Huang, @irinarish, @ebelilov * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback

143

12.0K

Irina Rish Retweeted

Ethan Mollick@emollick · May 23

Huh. Looks like Plato was right. A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text. Implications for philosophy and vector databases alike.

406

2.0K

13.0K

11.0K

1.7M

Irina Rish Retweeted

Guillaume Dumas@introspection · May 24

🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE! After a year in stealth w/@AnnemarieWolff, our benchmarks show #quantum speedups for brain data simulation & analysis using @qiskit + @IBM QS1 —> Next: #OpenSource tools & intl. collab 🇯🇵🔄🇨🇦

2.0K

Irina Rish Retweeted

Guillaume Dumas@introspection · May 21

You can already check our recent works on this topic: - LLMs and Personalities: Inconsistencies Across Scales openreview.net/forum?id=vBg3O… - Lost in Translation: The Algorithmic Gap Between LMs and the Brain arxiv.org/abs/2407.04680

652

Irina Rish Retweeted

Guillaume Dumas@introspection · May 21

Grateful for the @IVADO_Qc Exploratory Grant with @IrinaRish & @Tommaso_Tosato on how #LLMs express personality traits & socio-emotional responses—toward safer #AI in Health & Education ivado.ca/en/2025/04/09/…

866

Irina Rish@irinarish · May 22

MatFormers are very powerful alternatives to transformers. Similar to a regular transformer, but after training, you can split up the model to any size you like and get very strong performance that scales just like a regular transformer. So train once, get models of all sizes!

AAditya Kusupati@adityakusupati · May 20

Pocket powerhouse admist I/O awesomeness! Gemma 3n E4B & E2B are insane models, optimized for on-device while rivaling frontier models. It's a 🪆Matryoshka Transformer (MatFormer)🪆: Natively elastic b/w 4B & 2B pareto-optimally! ⭐️: free models with ZERO training cost! 🧵👇

352

174

45.0K

Irina Rish@irinarish · May 10

Quite impressive!

❄❄️Andrew Zhao❄️@ICML25@_AndrewZhao · May 7

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

3.0K

Irina Rish@irinarish · May 3

totally! I love vibe-coding, the efficiency is unreal

AAndrej Karpathy@karpathy · Jan 24, 2023

The hottest new programming language is English

2.0K