Michal Wilinski
@inverse_hessian
member of technical staff @ stealth · incoming cs phd student @SCSatCMU · bsc from @PUT_Poznan
spent few hours integrating this to TRL for online methods 🤝🏻 the code itself isn't much but testing took time 🥲 works when training part is on a single GPU or when training and vLLM are colocated on single GPU 🙇🏻♀️
transformers 🤝 vLLM for VLM serving is out 🔥 you can now serve many vision language models in vLLM, it makes a huuuge difference for Qwen VLs 😍
We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…
Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…
Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more…
🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI…
If you're at #ICML2025, come say hi and learn about teacher hacking in distillation. See you at poster E-2706!
1/ If you’re familiar with RLHF, you likely heard of reward hacking —where over-optimizing the imperfect reward model leads to unintended behaviors. But what about teacher hacking in knowledge distillation: can the teacher be hacked, like rewards in RLHF?
Today, come and see us at the poster session; East Exhibition Hall - Joint MoE Scaling Laws (E-2609); tl;dr MoE can be memory efficient - Since Faithfulness Fails (E-2101); tl;dr inferring causal relationships turns out to be surprisingly hard Let’s chat more, my great…
Come see our poster!
Wednesday, 4:30 PM PDT Presenting our work Exploring Representations and Interventions in Time Series Foundation Models (arxiv.org/abs/2409.12915) along with @inverse_hessian in (West Exhibition Hall B2-B3 hashtag#W-507)
blog.ml.cmu.edu/2025/07/08/car… Check out our latest post on CMU @ ICML 2025!
We are excited to announce the second edition of the Robot Air Hockey Challenge! A challenging benchmark to test your robotics and robot learning abilities! Another collaboration between @ias_tudarmstadt and @Huawei Noah's Ark lab, to push the limits of robotics research!
Excited to highlight @WPotosnak et al.'s work: a novel hybrid global-local architecture + model-agnostic pharmacokinetic encoder that enables patient-specific treatment effect modeling—significantly improving blood glucose forecasting on large-scale datasets. #CHIL2025 @AutonLab
the mech interp team v. the safety team at Anthropic in a nutshell
the more i think about that "agentic misalignment" research, the more frustrated i get. it is deeply, *offensively* unserious work. if you really think you're in a position of unprecedented leverage over the human future, then -- start acting like it!! nostalgebraist.tumblr.com/post/787119374…
In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance
Oh I am a big fan of self supervised learning. Also ssl has never been benchmark maxing on imagenet afaik. I am mainly complaining about the supervised classification imagenet hill climb
i was finally convinced by @_sungmin_cha and @beopst to work on unlearning. the first we did (or i learned) together with sungjin and dason was to learn how people evaluate unlearning, and horror 😱 here is our short writeup and report on what we found and what we propose as a…