Bill Psomas
@bill_psomas
Postdoctoral researcher @ VisualRecognitionGroup, @CVUTPraha. PhD @ntua. Former IARAI, @Inria, @athenaRICinfo intern. Photographer. Crossfit freak.
Our paper "Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations" has been accepted to @ICCVConference! 🧵 TL;DR: Masked image models (like MAE) underperform not just because of weak features, but because they aggregate them poorly. [1/7]
New paper out - accepted at @ICCVConference We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵
Can we build multimodal models by simply aligning pretrained unimodal models with limited paired data? We introduce STRUCTURE 🏗️: a lightweight, plug-and-play regularizer that preserves latent geometry to align frozen unimodal models using <1% of paired data typically used in…
Deep search and satellite data is an interesting mix 🔥🛰️ The demo shows the AI Reporter doing deep investigation into the Palisades fire early in 2025. In ~2 minutes you get a concise report, based on deep search and satellite data. #geoAI #AI4EO #PalisadesFire #AgenticAI
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
Thank got that nobody submits papers to both #ICCV2025 and #NeurIPS2025. Writing rebuttals for one while working on the deadline for the other would be a total nightmare.
🚨 Call for Papers! 7th Instance-Level Recognition and Generation Workshop (ILR+G) at @ICCVConference 📍 Honolulu, Hawaii 🌺 📅 October 19–20, 2025 🌐 ilr-workshop.github.io/ICCVW2025/ in-proceedings deadline: June 7 out-of-proceedings deadline: June 30 #ICCV2025
🚀 Greeks in AI is booming! 200+ sign-ups, 30+ OpenReview submissions, and 🔥 sponsors joining daily. 📍Limited seats at Serafeio — register now: 👉 greeksin.ai Stay tuned for speakers, program, and abstract previews! #GreeksInAI #AI #ML #Research #Greece
👏 Huge congrats to our research scientist Elias Ramzi @EliasRamzi for winning the AFRIF 2024 PhD award for his thesis "Robust image retrieval with deep learning", conducted at CNAM. Well deserved recognition for amazing work! 🏆 🔗 afrif.irisa.fr/?page_id=54
Collegues in Europe are running this poll about #NeurIPS2025 participation. If in Europe, highly recommended to participate.
ILIAS is a large-scale dataset for evaluation on Instance-Level Image retrieval At Scale. It is designed to support research in image-to-image and text-to-image retrieval for particular objects and serves as a benchmark for evaluating foundation models and retrieval techniques
🧵 Excited to share our latest work: FUTURIST - A unified transformer architecture for multimodal semantic future prediction, is accepted to #CVPR2025 ! Here's how it works (1/n) 👇 Links to the arxiv and github below
Excited to share that the recordings and slides of our SSLBIG tutorial are now online! If you notice any missing reference or have feedback, feel free to reach out. @eccvconf Stay tuned for future editions! webpage: shashankvkt.github.io/eccv2024-SSLBI… Youtube: youtube.com/@SSLBiG_tutori…
Incredibly excited to announce the 1st edition of our tutorial at @eccvconf w/ the amazing @y_m_asano and @MrzSalehi! "Time is precious: Self-Supervised Learning Beyond Images" on 30th Sept. from 09:00 to 13:00 at Amber 7+ 8 Catch the details here⬇️ shashankvkt.github.io/eccv2024-SSLBI…
1/n🚀If you’re working on generative image modeling, check out our latest work! We introduce EQ-VAE, a simple yet powerful regularization approach that makes latent representations equivariant to spatial transformations, leading to smoother latents and better generative models.👇
🚀Exciting news🚀 I’ve been awarded the Marie Skłodowska-Curie Postdoctoral Fellowship (#MSCA-PF) 2024 with 98/100!🎉 🥟My project, RAVIOLI, hosted at @CVUTPraha, integrates retrieval-augmented predictions into vision-language models for open-vocabulary segmentation.
1/n 🚀 Excited to share our latest work: DINO-Foresight, a new framework for predicting the future states of scenes using Vision Foundation Model features! Links to the arXiv and Github 👇
Self-supervised Learning with Masked Autoencoders (MAE) is known to produce worse image representations than Joint-Embedding approaches (e.g. DINO). In our new paper, we identify new reasons for why that is and point towards solutions: arxiv.org/abs/2412.03215 🧵
🚀New paper alert: FREEDOM is here! Check out “Composed Image Retrieval for Training-FREE DOMain Conversion,” our training-free method for domain conversion with VLMs.🎯 📜WACV 2025 💡Retrieve images using image+text queries! 📖arxiv.org/abs/2412.03297 🔗github.com/NikosEfth/free…
At @naverlabseurope in Grenoble, France, we are searching for talented PhD interns for work on Spatial AI, geometric and robotic foundation models for navigation and manipulation. If you have experience in Embodied AI and are interested, DM me.