Or Tal
@Or__Tal
PhD candidate @HebrewU; Research Assistant @MetaAI (FAIR)
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻
🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪
Happy to announce that our paper “EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits” was accepted to #ACL2025 🎉 📄 arxiv.org/abs/2506.09988 🌐 editinspector.github.io
💣Introducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. 🤗 huggingface.co/slprl/PAST Check out Nadav's post👇@NadavHarTuv @adiyossLC
🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗pages.cs.huji.ac.il/adiyoss-lab/PA… 👇 If you’re into speech LMs, keep reading!
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation. arxiv.org/abs/2506.08570
🎵💬 If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a ☕/🍺 and sit down for a read!
🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7
We’ve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! 🎶🧠
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6
🚨 Happy to share our #Interspeech2025 paper! "WhiStress: Enriching Transcriptions with Sentence Stress Detection" Sentence stress is a word-level prosodic cue that marks contrast or intent. WhiStress detects it alongside transcription—no alignment needed. Paper, code, demo 👇
This work was done during my internship at @AIatMeta 🎉 Huge thanks to my amazing collaborators @urielsinger @amit_zhr @YKirstain @adam_polyak90 Yaniv Taigman @liorwolf and @ShellySheynin Check out the project page for many more results and details: hila-chefer.github.io/videojam-paper…
[1/7] 📜 I can finally share that our recent @NVIDIA project DiffUHaul --- A Training-Free Method for Object Dragging in Images has been accepted to #SIGGRAPHAsia2024 🎉. Project Page: omriavrahami.com/diffuhaul/