Adi Haviv
@adihaviv
CS Ph.D. Candidate at @TelAvivUni. Researching #NLProc and Computer Vision.
Excited to present our latest work at #GenLaw #ICML2024! Interested in whether T2I Stable Diffusion models can create original content, how to measure originality, and how this relates to copyright infringements? Join me at the poster session today at 2pm in Lehar 2! 👩🏫🧵

{1/8} 🧵 When you click a link, have you ever wondered: “Which webpage is actually important?” Google answered that with PageRank—treating the web as a Markov chain. Now imagine doing the same… but for transformer attention.👇 🔗 yoterel.github.io/attention_chai…
1/ Can we teach a motion model to "dance like a chicken" Or better: Can LoRA help motion diffusion models learn expressive, editable styles without forgetting how to move? Led by @HSawdayee, @chuan_guo92603, we explore this in our latest work. 🎥 haimsaw.github.io/LoRA-MDM/ 🧵👇
A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵
Really impressive results for human-object interaction. They use a two-phase process where they optimize the diffusion noise, instead of the motion itself, to get to sub-centimeter precision while staying on manifold 🧠 HOIDiNi - hoidini.github.io
Excited to share that our new work, Be Decisive, has been accepted to SIGGRAPH! We improve multi-subject generation by extracting a layout directly from noise, resulting in more diverse and accurate compositions. Website: omer11a.github.io/be-decisive/ Paper: arxiv.org/abs/2505.21488
🔔Excited to announce that #AnyTop has been accepted to #SIGGRAPH2025!🥳 ✅ A diffusion model that generates motion for arbitrary skeletons ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🌐 Project: anytop2025.github.io/Anytop-page
Excited to share that "TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space" got accepted to SIGGRAPH 2025! It tackles disentangling complex visual concepts from as little as a single image and re-composing concepts across multiple images into a coherent…
🔔just landed: IP Composer🎨 semantically mix & match visual concepts from images ❌ text prompts can't always capture visual nuances ❌ visual input based methods often need training / don't allow fine grained control over *which* concepts to extract from our input images So👇
pretty mind-blowing fact I just learned about transformer language models: the positional embeddings don't really do anything. you can just get rid of them and the model still works just as well sounds impossible, doesn't it? turns out standard LLMs aren't actually…
Transformers can work without using positional embeddings at all. Llama 4 uses positional embs for local attn but not globally. Our paper from 2022 shows why this works- the causal mask allows transformers to infer positions. arxiv.org/pdf/2203.16634
Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯
New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7
Ever stared at a set of shapes and thought: 'These could be something… but what?' Designed for visual ideation, PiT takes a set of concepts and interprets them as parts within a target domain, assembling them together while also sampling missing parts. eladrich.github.io/PiT/
🚀 New preprint! 🚀 Check out AnyTop 🤩 ✅ A diffusion model that generates motion for arbitrary skeletons 🦴 ✅ Using only a skeletal structure as input ✅ Learns semantic correspondences across diverse skeletons 🦅🐒🪲 🔗 Arxiv: arxiv.org/abs/2502.17327
Excited to introduce our new work: ImageRAG 🖼️✨ rotem-shalev.github.io/ImageRAG We enhance off-the-shelf generative models with Retrieval-Augmented Generation (RAG) for unknown concept generation, using a VLM-based approach that’s easy to integrate with new & existing models! [1/3]
🚀 Meet DiP: our newest text-to-motion diffusion model! ✨ Ultra-fast generation ♾️ Creates endless, dynamic motions 🔄 Seamlessly switch prompts on the fly Best of all, it's now available in the MDM codebase: github.com/GuyTevet/motio… [1/3]
VideoJAM is our new framework for improved motion generation from @AIatMeta We show that video generators struggle with motion because the training objective favors appearance over dynamics. VideoJAM directly adresses this **without any extra data or scaling** 👇🧵
What if you could compose videos— merging multiple clips, even capturing complex athletic moves where video models struggle - all while preserving motion and context? And yes, you can still edit them with text after! Stay tuned for more results. #AI #VideoGeneration #SnapResearch
How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New…
Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…
[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.