Andrew Rouditchenko 🇺🇦

@arouditchenko

PhD student at MIT working on multi-modal and multilingual speech. I was an intern at @AIatMeta and @Apple MLR.

Joined December 2016

542Following

444Followers

Pinned

Andrew Rouditchenko 🇺🇦@arouditchenko · May 15

Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439

145

114

8.0K

Pinned

Andrew Rouditchenko 🇺🇦@arouditchenko · May 7

mWhisper-Flamingo was accepted at IEEE Signal Processing Letters! To celebrate, I uploaded my presentation about it: youtu.be/NjeEZWO7m9I I would have submitted to Interspeech, but I couldn't travel during those dates. I'm hoping to present this at ICASSP 2026 in Spain!

1.0K

Andrew Rouditchenko 🇺🇦 Retweeted

Peyman Milanfar@docmilanfar · Jun 30

If your PhD advisor dressed like this, you probably didn't use neural nets in your thesis

970

81.0K

Andrew Rouditchenko 🇺🇦 Retweeted

yobibyte@y0b1byte · Jun 26

Finally, after all these years of being mocked, ffmpeg enthusiasts win!

345

6.0K

2.0K

354.0K

Andrew Rouditchenko 🇺🇦 Retweeted

Heng-Jui Chang@hjchang87 · Jun 25

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

2.0K

Andrew Rouditchenko 🇺🇦 Retweeted

Herman Kamper@HermanKamper · Jun 20

Learn to figure out what is worth figuring out: kamperh.com/2025/06/20/kno…

306

Andrew Rouditchenko 🇺🇦@arouditchenko · May 22

Congrats to Edson for leading our Contrastive Audio-Visual Masked Autoencoders 2.0 Project (CAV-MAE Sync), accepted at #CVPR2025! Check out Edson's thread for more details ⬇️

EEdson Araujo@edsonroteia · May 22

🚀 Excited to announce our #CVPR2025 paper: CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment! We introduce a simple yet effective method for improved audio-visual learning. 🔗 Project: edsonroteia.github.io/cav-mae-sync/ 🧵 (1/7)👇

347

Andrew Rouditchenko 🇺🇦@arouditchenko · May 14

Granite-speech audio LLM from IBM. The level of data detail here is great especially comparing to ie. Whisper paper

aarXiv Sound@ArxivSound · May 14

``Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities,'' George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, Gakuto Kurata, Ha… ift.tt/QPsxkH2

396

Andrew Rouditchenko 🇺🇦 Retweeted

arXiv Sound@ArxivSound · May 14

1.0K

Andrew Rouditchenko 🇺🇦 Retweeted

arXiv Sound@ArxivSound · May 5

``CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment,'' Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogerio Feris, James R. Glass, ift.tt/USw7Px0

2.0K

Andrew Rouditchenko 🇺🇦@arouditchenko · Apr 2

"Has there been any case of theft on the world's largest barbecue?" 😆

kkyutai@kyutai_labs · Apr 1

Have you enjoyed talking to 🟢Moshi? Have you dreamt of making your own speech to speech chat experience🧑‍🔬🤖 ? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice, the tone and the personality of Moshi 💚🔌💿. Here's an example after…

174

Andrew Rouditchenko 🇺🇦@arouditchenko · Mar 24

I'm curious about how OpenAI used RL for training their ASR models ("This methodology dramatically improves precision and reduces hallucination, making our speech-to-text solutions exceptionally competitive in complex speech recognition scenarios.") openai.com/index/introduc…

arouditchenko's tweet card. For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customiza...

375

Andrew Rouditchenko 🇺🇦@arouditchenko · Mar 10

I'll present a dive into Moshi 🟢 and our translation model Hibiki 🇫🇷♻️🇬🇧 in the next @convAI2024 reading group👨‍🏫📗. 📅 13/03 🕰️ 11am ET, 4pm in Paris. I'll discuss Mimi 🗜️ and multistream audio modeling 🔊. Join on Zoom, replay on YT. ⬛⬛🟧🟧🟨🟨🟩🟩🟩⬛ ⬛🟧🟧🟨🟨🟩🟩🟩⬛⬛

CConvai_rg@convAI2024 · Mar 10

📢 Join our Conversational AI Reading Group! 📅 Thursday, March 13 | 11 AM - 12 PM EST 🎙Speaker: Alexandre Defossez @honualx 📖 Topic: "Moshi: a speech-text foundation model for real-time dialogue" 🔗 Details: (poonehmousavi.github.io/rg)

2.0K

Andrew Rouditchenko 🇺🇦 Retweeted

Shiqi Yang@shiqi_yang_147 · Mar 4

Looking for 1 intern on audio-visual generation (potentially video2audio generation)! We have the largest computation resources in Japan, and we do serious industrial research (and development). DM if interested, and you can find more about me in my homepage.

6.0K

Andrew Rouditchenko 🇺🇦 Retweeted

Yehor Smoliakov@yehor_smoliakov · Feb 27

Follow our initiative to boost Ukrainian speech technologies! huggingface.co/speech-uk

107