Royi Rassin
@RoyiRassin
PhD candidate @biunlp researching multimodality. Intern @GoogleAI
How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure based on LLMs and Visual-QA (VQA), and show NONE of the 12 models we experiment with are diverse. 🧵 1/11

Here is my short and practical thread on how to teach your Text-to-Image model to generate readable text: [1/n]
IRGC, no matter what you do, please do not attack the compute cluster at Bar-Ilan university. It is priceless and impossible to replace. We will be devastated if it will be destroyed. (it also has tons of super sensitive and irreplaceable military stuffs!!!!)
1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models! We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state. Complex prompts are highly guided while simplem ones are almost free
🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…
Video generative models hold the promise of being general-purpose simulators of the physical world 🤖 How far are we from this goal❓ 📢Excited to announce VideoPhy-2, the next edition in the series to test the physical likeness of the generated videos for real-world actions. 🧵
🎉 I'm happy to share that our paper, Make It Count, has been accepted to #CVPR2025! A huge thanks to my amazing collaborators - @YoadTewel, @SegevHilit , @hirscheran, @RoyiRassin, and @GalChechik! 🔗 Paper page: make-it-count-paper.github.io Excited to share our key findings!
Diffusion models are the current go-to for image generation, but they often fail miserably in generating an accurate count of objects. Our new #CVPR paper proposes a method on top of such models to *enforce* the correct number.
🎉 I'm happy to share that our paper, Make It Count, has been accepted to #CVPR2025! A huge thanks to my amazing collaborators - @YoadTewel, @SegevHilit , @hirscheran, @RoyiRassin, and @GalChechik! 🔗 Paper page: make-it-count-paper.github.io Excited to share our key findings!
Our paper "A Practical Method for Generating String Counterfactuals" has been accepted to the findings of NAACL 2025! a joint work with @matan_avitan_ , @yoavgo and Ryan Cotterell. We propose "Coutnerfactual Lens", a technique to explain intervention in natural language. (1/6)
Top 3 papers submitted today on Hugging Face 1. SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators 2. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling 3. Exploring the Limit of Outcome Reward for Learning Mathematical…
I'm excited to announce that my nonfiction book, "Lost in Automatic Translation: Navigating Life in English in the Age of Language Technologies", will be published this summer by Cambridge University Press. I can't wait to share it with you! 📖🤖 cambridge.org/core/books/los…
VideoJAM is our new framework for improved motion generation from @AIatMeta We show that video generators struggle with motion because the training objective favors appearance over dynamics. VideoJAM directly adresses this **without any extra data or scaling** 👇🧵
Exciting to see Gemini 2.0 and Gemini-2.0-thinking taking on the Visual Riddles challenge! The leaderboard is heating up, with open-ended auto-rating accuracy currently around the mid-50s. Lots of room for improvement across all models!
🚀🚀🚀 OpenAI O1, Gemini-2.0 and Gemini-2.0-thinking are on the #VisualRiddles leaderboard! Multiple Choice: Gemini-2.0-thinking hits 60% accuracy (84% with hints!) Open-Ended (Auto-Rating): O1 leads with 58% accuracy. Check it out: 🔗 visual-riddles.github.io @YonatanBitton
i was annoyed at having many chrome tabs with PDF papers having uninformative titles, so i created a small chrome extension to fix it. i'm using it for a while now, works well. today i put it on github. enjoy. github.com/yoavg/pdf-tab-…
Excited to finally share this work w/ @SuryaGanguli. Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. arxiv.org/abs/2412.20292
If you care about image understanding, you will love Visual Riddles
🚀 Big news for #VisualRiddles! We’re excited to announce that Visual Riddles has been accepted to the Creative AI Track at NeurIPS 2024! 🎉 Come explore our Visual Riddles Gallery—a showcase of cognitive and visual challenges for multimodal AI. 🧵
🚀 Big news for #VisualRiddles! We’re excited to announce that Visual Riddles has been accepted to the Creative AI Track at NeurIPS 2024! 🎉 Come explore our Visual Riddles Gallery—a showcase of cognitive and visual challenges for multimodal AI. 🧵
I received feedback that my post about reviews not being "random" caused stress for some students. I'm sorry for that. It was meant to be empowering. Personally, I find the idea that I don't have some control over the destiny of my papers to be disheartening. If the process is…
Super excited to be awarded the 2024 Google PhD Fellowship in Natural Language Processing! Huge thanks to my advisor @JonathanBerant, my collaborators, and @GoogleAI for supporting our research - exciting things ahead! blog.google/technology/res…
there are many smart speakers and thinkers around AI/ML and/or NLP. but i find almost everything to be kinda predictable by now, minor stylistic variations on the same story. who are some *interesting* speakers i should listen/read? i want things that may surprise or inspire me.
So honored that SUPER has received an Outstanding Paper Award! Huge thanks to everyone involved in this work!
Announcing the 20 **Outstanding Papers** for #EMNLP2024