Afshin Dehghan
@afshin_dn
We'll present at NeurIPS, today at 5pm CST. Spotlight #1022. Effectively bringing sensory modalities to large models is one way to make them more grounded, and ultimately have a more complete World Model. This is a step in that direction hopefully, and more will come.
4M exhibits having learned a solid cross-modal representation. We can use the various modalities to probe how 4M reconciles unusual inputs by manipulating one part of it while keeping the remainder fixed. (8/n)
Yesterday we shared our latest work on pretraining data curation. What if we stop guessing which data is “good” and directly match pretraining data to the benchmarks we care about? 📄 arxiv.org/abs/2507.12466 #AIResearch #llm #DataCuration #Pretraining #ScalingLaws
Excited to share our new work: “Language Models Improve When Pretraining Data Matches Target Tasks” Yes, it sounds obvious (and it is!), but typically this only happens implicitly and indirectly: intuitively select data → benchmark → refine → repeat. We wondered: what…
Incredibly proud of the work across teams in delivering the latest version of Visual Intelligence. Visual Intelligence makes it faster to do more with what’s right in front of you. #WWDC25 #visualintelligence #AppleIntelligence
Very excited to announce our final line-up of fantastic speakers at this year's @CVPR workshop on Open-World 3D Scene Understanding with Foundation Models ✨ #OpenSUN3D #cvpr2025 📆 June 12, 2pm-6pm 🏡 opensun3d.github.io
Singapore can get you off a plane, through immigration, and into a cab in under 30 minutes. But at #ICLR25, you’ll need over 2 hours and a 0.5 mile hike just to get your badge. Congrats to #ICLR for breaking the record for most academic patience ever tested. #ICLR25 #ConfLife
Excited to share that we have recently released the source code for FlexTok, bringing a fresh perspective to tokenization. Code on GitHub: lnkd.in/g4iNJFmU. Project Page: flextok.epfl.ch #FlexTok #Tokenization #MachineLearning #MLResearch #OpenSource #AI
🚀 Model and data for our CubifyAnything project are now released! 🔗 github.com/apple/ml-cubif… #SpatialReasoning #3DObjectDetection #transformers #detection #ai #genai
We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple. 4M: Massively Multimodal Masked Modeling 🌐4m.epfl.ch 🧵1/n