Federico Baldassarre
@BaldassarreFe
Postdoctoral Researcher @AIatMeta: DINOv3 and world models. PhD @kth_rpl: deep learning explainability, concept-based visual representations and reasoning.
Highlighting a great work from my colleagues @mathuvu_ and @byoubii! On my wish list for Christmas there is a version of this that works for images, doing automatic "tokenization" based on parts/objects. Apart from grid-based pooling and token merging, what else is out there?
We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with @byoubii 1/8
Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model,…
You get what you optimize for ;)
🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?
CAPI was a lot of fun to work on! In-depth research to make patch-based latent SSL work. And plenty of opportunities to hack deep into pytorch to squeeze the most FLOPs out of the GPU. Thanks @TimDarcet for leading this journey!
Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
As the current postdoc in residence in the DINO team, I absolutely recommend this position! Such a fun team and interesting research directions 🔥
🔥 The DINO team is looking for a PostDoc! 🔥 If you are about to graduate, and want to be part of what’s next for SSL, don’t hesitate to reach out! Link to job offer : metacareers.com/jobs/502476149…
Awesome work on vision-language SSL! Can't wait to dig into the code and test out some models ;)
𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔 Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding. github.com/apple/ml-aim (🧵)
Hey @BMVCconf, can you give a rough estimate of the registration fee for students with an accepted paper? I would like to apply for a travel grant that has the deadline today.
My paper "Probabilistic Regression with Huber Distributions" will be presented on BMVC tomorrow. In this paper we estimate mean-covariance parameterized probability distributions from using neural networks.
Proud of Federico Baldassarre for being selected as top 12 reviewers among 2830 reviewers of #ECCV2020! Keep up the good work @BaldassarreFe!
First meeting of the Naiads team: monitoring water pollution using satellite data as part of the #CopernicusHackathon powered by #InnovatumStartup #ArcticBusiness #VentureCup

Our work "Probabilistic orientation estimation with matrix Fisher distributions" is now publicly available. arxiv.org/abs/2006.09740 A challenge for estimating orientations is the presence of cavities in the output space. We remove this problem by using a probabilistic method.