Mark Ibrahim
@marksibrahim
Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab
Open-weights for our Llip multimodal vision-language model led by @lavoiems are public! LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks. x.com/lavoiems/statu…
The code and model weights for this paper are finally open! Despite being a little late for releasing them, I hope you will find them useful! Code: github.com/facebookresear… Models: - (ViT-G): huggingface.co/lavoies/llip-v… - (ViT-B): huggingface.co/lavoies/llip-v…
How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
Join us at #CVPR2025 Demographic Diversity in Computer Vision workshop tomorrow! 📅 Wednesday, June 11, 9am-6pm 📍 room 213 (main session) + Hall D (poster sessions), the Music City Center We have an amazing lineup of speakers and panelists! Can't wait to meet you all there :)
Join us as a PhD research intern at FAIR w/ @polkirichenko @kamalikac to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email [email protected] with the title [Prospective Intern 2025] and attach your CV if interested!
The last paper of my PhD is finally out ! Introducing "Intuitive physics understanding emerges from self-supervised pretraining on natural videos" We show that without any prior, V-JEPA --a self-supervised video model-- develops an understanding of intuitive physics !
𝕏-CLR got accepted to ICLR 2025 @iclr_conf! See you in Singapore! It was also recently mentioned in The Batch by @DeepLearningAI (issue 284) Thank you again to my collaborators: @marksibrahim @randall_balestr @CabannesVivien @D_Bouchacourt @Piovrasca @kchonyc @ylecun
Representation learning is often done by considering samples to be either identical (same class, positive pairs) or not–with no middle ground. We propose 𝕏-CLR to learn from soft inter-sample relationships, and get better accuracy & improved robustness. arxiv.org/abs/2407.18134
🚀 Excited to share our work at #NeurIPS2024! We show how billion parameter VLMs lose to a two-layer MLP on MNIST. Come by our poster presentation at West Ballroom A-D #5211, today from 4:30–7:30 PM PST. A 🧵:
Work done w/ amazing collaborators @oscmansan, @ReyhaneAskari, @marksibrahim, @candacerossio, @Piovrasca, Tariq Berrada, @HavasiMarton, @BenchetritYoha1, @karen_ullrich, Carolina Braga, Abhishek Charnalia, Maeve Ryan, Mike Rabbat, @michal_drozdzal, @JakobVerbeek, @adri_romsor
New research from Meta FAIR: UniBench is a unified implementation of 50+ VLM benchmarks spanning a comprehensive range of carefully categorized capabilities from object recognition to spatial awareness, counting and much more. Research paper ➡️ go.fb.me/fa97z9
Meta announces UniBench Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling discuss: huggingface.co/papers/2408.04… Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of…
A soft similarity graph improves contrastive learning for image recognition. By @vlad_is_ai and a cast of characters from Meta FAIR, NYU, and Brown.
Representation learning is often done by considering samples to be either identical (same class, positive pairs) or not–with no middle ground. We propose 𝕏-CLR to learn from soft inter-sample relationships, and get better accuracy & improved robustness. arxiv.org/abs/2407.18134