Michael Hassid
@MichaelHassid
PhD candidate @HebrewU; Research Assistant @AIatMeta (FAIR)
The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

Nice to see that TOVA is still one of the leading KV compression methods, even after more than a year and a half, especially for high-compression regimes. Paper: arxiv.org/abs/2401.06104
🏆 Our @nvidia KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇 huggingface.co/spaces/nvidia/…
🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc
🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪
Excited to share our recent work on corrector sampling in language models! A new sampling method that mitigates error accumulation by iteratively revisiting tokens in a window of previously generated text. With: @shaulneta @urielsinger @lipmanya Link: arxiv.org/abs/2506.06215
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6
Thanks @_akhaliq for sharing our work!
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Great work led by @GallilMaimon about scaling Speech LMs. Check it out!
Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻
``Scaling Analysis of Interleaved Speech-Text Language Models,'' Gallil Maimon, Michael Hassid, Amit Roth, Yossi Adi, ift.tt/gvRXTOe
Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!
During my TA position for the advanced NLP course at HUJI last year, I mentored students on their projects. I'm excited to share that one of those projects has developed into a (preprint) paper. Check it out :)
Can RAG performance get * worse * with more relevant documents?📄 We put the number of retrieved documents in RAG to the test! 💥Preprint💥: arxiv.org/abs/2503.04388 1/3