🇺🇦 Dzmitry Bahdanau
@DBahdanau
Team member at something young. Adjunct Prof @ McGill. Member of Mila, Quebec AI Institute. Stream of consciousness is my own.
I really don't care about reviewers' opinion. But last few weeks before NeurIPS is the only time when it is socially acceptable to neglect almost all your responsibilities and focus on research. 🤪🤪🤪
Did you know that nature chooses the microstate with a softmax layer?
Great comment. LLMs feel like a 1000 year old robot who worked all jobs, talked to everyone, learned everything, and yet can't cross the uncanny valley of actually *getting* what I want, having agency, being creative. But OTOH that's enough to shake the economy.
7. However, LLMs will become exceedingly powerful for problems that *someone* knows how to solve (in-distribution, in training data). In math research, you combine existing techniques with new creative ideas. LLMs will significantly accelerate the former part. (7/10)
As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. 🇨🇦 We've been waiting 28 months for permanent residency, but @CitImmCanada won’t budge. Please read and share our story facebook.com/share/p/1AwU2f… linkedin.com/posts/gbxhuang… #IRCC #AI #Immigration #AI
There's one idea as old as humanity. If I am in control, I can make it better. Hence, I should fight for control and win. At any cost. Ends justify the means. And this is how people with big egos abandon all principles and become evil.
So long, @ServiceNowRSRCH ! It's been great 4 years. I look forward to cheer for more great open-source AI releases from the talented ServiceNow AI people! I will tell you what's next in due time 😉

so nice to have a few actual Scientists in our community that respect empirical results even when they are go counter their intuition!
This study surprised me! The conclusion is opposite to what I would expect. It is tempting to try to find a reason it's bogus but I think it's well executed and solid work. As the authors say, there are a number of potential caveats for this setting that may not generalize…
exactly what I feel about using AI to make advanced modifications in RL code a lot of busy and chaotic activity but slower than thinking carefully on one's own
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
knowledge for the sake of knowledge is useless we need knowledge that informs action and creates impact scaling up NeurIPS to 20K submissions will only further lower the ratio of impactful knowledge in the proceedings you can guess, I'm reviewing now...
A lot of great ideas on how to remedy training instabilities in @MiniMax__AI tech report. Check it out!
Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:…
finally a voice of reason about mechanistic interpretability!
I wrote about why efforts to understand the inner workings of AI keep falling short.
200% grit grit grit and 50 w&b curves and everything will work eventually
Someone passed this wisdom to me today. Deep learning techniques working vs not working is two devils - your prior about the technique - your attention to details about implementation of the technique Need both to make it work.
ask Claude 3.7 if your code has any obvious bugs watch it invent issues in your code when there aren't any in 90% cases, leading you astray make your judgement about how soon decent software engineers become useless
I'm embarassed to admit that I have just grokked how amazing Python coroutines and asyncio are. I want to rewrite every single piece of code with threads I have every written! But the learning curve is steep. This great blog opened my eyes: tenthousandmeters.com/blog/python-be…
nicely done, team!!
🚨🤯 Today Jensen Huang announced SLAM Lab's newest model on the @HelloKnowledge stage: Apriel‑Nemotron‑15B‑Thinker 🚨 A lean, mean reasoning machine punching way above its weight class 👊 Built by SLAM × NVIDIA. Smaller models, bigger impact. 🧵👇
I think system3 is where AI will get stuck
System 1 = fast, implicit reasoning System 2 = slow, explicit reasoning System 3 = slow, implicit reasoning For me, system 3 is the real genius of the lot.
Adam deserves the award, but in Singapore everyone still uses SGD