Yael Vinker🎗
@YVinker
Postdoctoral Associate at @MIT_csail
Excited to share that I officially completed my PhD in Computer Science at Tel Aviv University and have joined Prof. Antonio Torralba’s lab at MIT for my postdoc! Looking forward to researching visual communication, and surviving Boston winters without turning into a popsicle!❄️

Tim Rocktäschel’s keynote talk at #ICLR2025 about Open-Endedness and AI. “Almost no prerequisite to any major invention was invented with that invention in mind.” “Basically almost everybody in my lab at UCL and at DeepMind have read this book: Why Greatness Cannot Be Planned.”
Introducing Act-Two, our next-generation motion capture model with major improvements in generation quality and support for head, face, body and hand tracking. Act-Two only requires a driving performance video and reference character. Available now to all our Enterprise…
Excited to take part in @TEDAISF this fall!
📣 Speaker Announcement: @judyefan, Asst Prof of Psychology at @Stanford, joins @TEDAISF 2025! At Stanford’s Cognitive Tools Lab, she studies how our mental tools shape thought, and what that means for designing smarter AI. 📅 October 21–22, 2025 📍 San Francisco Link below ⬇️
I threw together a short paper outlining our tool Diffusion Explorer. Diffusion Explorer allows you to learn about the geometric properties of diffusion and flow based generative models with interactive animation.
Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd…
Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. flextok.epfl.ch arxiv.org/abs/2502.13967 🧵 1/n
Thanks for the thorough work! From my experience using multimodal LLMs for sketching and SVG generation, I’ve also found that they just aren’t yet good enough for tasks requiring precise spatial understanding. Great to finally have something this comprehensive to cite!
We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress;…
Following #CVPR2025, #ICCV2025 implemented a new policy targeting accountability and integrity. PCs identified 25 highly irresponsible reviewers, resulting in the desk rejection of 29 associated papers, including 12 submissions that otherwise would have been accepted.
#ICCV2025 decisions have been released! They are going out in batches to manage server load, so please be patient :) A huge thank you to our entire community—authors, reviewers, and area chairs—for your hard work. Congratulations to everyone whose paper was accepted!
Finding it hard to focus on anything else? We get it. The #ICCV2025 decision notifications are just around the corner. Sending positive vibes to the entire community! 🤙🌺
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…
Thrilled to present my "Agents" collection at @CVPR 2025's AI Art Gallery alongside incredible researchers and artists pushing the frontiers of AI creativity. Thanks to @elluba for curating an exhibition that perfectly captures this transformative moment - a testament to how AI…
Really impressive results for human-object interaction. They use a two-phase process where they optimize the diffusion noise, instead of the motion itself, to get to sub-centimeter precision while staying on manifold 🧠 HOIDiNi - hoidini.github.io
Just one day left to submit your archival paper to #AI4VA at #ICCV2025! Deadline: 22 June, 11:59 PM (AoE). ✍️ Don’t miss your chance to be part of the conversation on AI and the visual arts. @ICCVConference
📣 Deadline Extended: The full paper submission deadline for the #ICCV2025 AI for Visual Arts (AI4VA) Workshop @ICCVConference is now June 22, 2025 at 11:59 PM AoE. 🌐 Info: sites.google.com/view/ai4vaiccv…
Excited to be showing prints at Art Basel with AUTOMATA! They are visualizations of complete genealogical networks of individual ArtBreeder images, revealing the shapes of large-scale, indirect collaborative evolution over time. 🧬🌌🌺
AUTOMATA debuts today at The Digital Art Mile.
📣 Reminder: Full paper deadline for the AI for Visual Arts (AI4VA) Workshop @ICCVConference #ICCV2025 is June 19! 🎨 AI + creativity, culture & storytelling 📝 Accepted papers published in ICCV proceedings 🔗 Submit: openreview.net/group?id=thecv… 🌐 Info: sites.google.com/view/ai4vaiccv…
🎨 The AI for Visual Arts (AI4VA) Workshop is back for its 2nd edition at #ICCV2025 in Honolulu, HI, USA! 📢 Now accepting full papers (archival) & extended abstracts (non-archival). 🌐 Topics & CFP: sites.google.com/view/ai4vaiccv… 📝 Submit: openreview.net/group?id=thecv… @ICCVConference