Pietro Astolfi
@Piovrasca
As I'm heading out of Vancouver, I'm also wrapping up my postdoc at FAIR. It was a fun week with my friends/collabs and I made many great connections. But, hey! I'm still looking for full-time positions. Reach out if you work on multimodal generation/understanding #NeurIPS2024


The code and model weights for this paper are finally open! Despite being a little late for releasing them, I hope you will find them useful! Code: github.com/facebookresear… Models: - (ViT-G): huggingface.co/lavoies/llip-v… - (ViT-B): huggingface.co/lavoies/llip-v…
Should we account for the diverse ways that an image can be captioned? In our #ICML2024 paper. We propose Llip — a Vision Language Pretraining method that models the diverse ways in which an image can be captioned! 📜arxiv.org/abs/2405.00740 🧵👄
Excited to present our work "Improving the scaling laws of synthetic data with deliberate practice", tomorrow at #ICML2025 📢 Oral: Wed. 10:45 AM 📍 West Ballroom B (Oral 3C Data-Centric ML) 🖼️ Poster: 🕚 11:00 AM – 1:30 PM 📍 East Exhibition Hall A-B (Poster Session 3 East)
🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥…
Deliberate practice is accepted to #ICML2025 as a spotlight (top 2.6%!) 🚀
🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥…
It's interesting to see how, in order to increase prompt-image consistency, it had to decrease image quality/realism
The model sometimes enters a self-critique loop by itself, but you can trigger this manually, and the model tunes the prompt for itself through self-conversation. [Add e.g., "Verify the image, if it's incorrect, write your own prompt, try again, and repeat the process." ]
🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥…
𝕏-CLR got accepted to ICLR 2025 @iclr_conf! See you in Singapore! It was also recently mentioned in The Batch by @DeepLearningAI (issue 284) Thank you again to my collaborators: @marksibrahim @randall_balestr @CabannesVivien @D_Bouchacourt @Piovrasca @kchonyc @ylecun
Representation learning is often done by considering samples to be either identical (same class, positive pairs) or not–with no middle ground. We propose 𝕏-CLR to learn from soft inter-sample relationships, and get better accuracy & improved robustness. arxiv.org/abs/2407.18134
Excited to release EvalGIM, an easy-to-use evaluation library for generative image models. EvalGIM ("EvalGym") unifies metrics, datasets, & visualizations, is customizable & extensible to new benchmarks, & provides actionable insights. Check it out! github.com/facebookresear…
We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: arxiv.org/abs/2411.14402 Repo: github.com/apple/ml-aim Model Gallery: huggingface.co/collections/ap…
I've been a postdoc with this team and now I'm about to finish after almost 2y (+6m of a previous internship). They are amazing people and researchers! I couldn't recommend them more
We're looking for a postdoc to work with us in FAIR Montreal @AIatMeta. Interested in building generative visual models of the world and leveraging them to train dowsntream ML models? Apply: metacareers.com/jobs/376087892… cc:@hall__melissa @ReyhaneAskari @JakobVerbeek @michal_drozdzal
Can't say enough of how much I have enjoyed working with Adriana and Michal and the FAIR team in the past 2.5 years. If you have background in generative modeling/diffusion/flows. I definitely recommend applying.
We're looking for a postdoc to work with us in FAIR Montreal @AIatMeta. Interested in building generative visual models of the world and leveraging them to train dowsntream ML models? Apply: metacareers.com/jobs/376087892… cc:@hall__melissa @ReyhaneAskari @JakobVerbeek @michal_drozdzal
We just dropped our new paper “Visual Scratchpads: Enabling Global Reasoning in Vision” on ArXiv. A great collaboration with @aryolotfi, Samy Bengio, @moin_nabi and Emmanuel Abbé at Apple. 📜arxiv.org/abs/2410.08165 🧵 ⬇️
If you're in Milan for #ECCV2024, come by our poster. Melissa (@hall__melissa) and I will be presenting c-vsg tomorrow from 10:30 to 12:00 in the Exhibition Area, poster 108.
🚨Excited to share our recent work on improving the diversity of generations in diffusion models. TL;DR: In this work we apply additional guidance by computing the Vendi Score of previously generated images and enforce the new generations to be different from previous ones (1/4)
The call for papers for the 7th edition of the IA Montreal Symposium is now open! Accepted contributions will be presented at the event, either as a contributed talk or as a poster. You have until August 22 to apply! Full details here ow.ly/Uc9P50SKqcb
Representation learning is often done by considering samples to be either identical (same class, positive pairs) or not–with no middle ground. We propose 𝕏-CLR to learn from soft inter-sample relationships, and get better accuracy & improved robustness. arxiv.org/abs/2407.18134
Can generative image models be good world models? This work from @Meta FAIR shows that there is a tradeoff between realism and diversity. The more realistic a generative model becomes, the less diverse it becomes. Realism comes at the cost of coverage. In other words, the most…
Are sota image generative models effective world models? Consistency-diversity-realism Pareto fronts show they're not (yet): - No model dominates others as a world model - Improvements in quality and consistency have come at the expense of diversity 🔗 arxiv.org/abs/2406.10429
📝 New from FAIR: An Introduction to Vision-Language Modeling. Vision-language models (VLMs) are an area of research that holds a lot of potential to change our interactions with technology, however there are many challenges in building these types of models. Together with a set…
Should we account for the diverse ways that an image can be captioned? In our #ICML2024 paper. We propose Llip — a Vision Language Pretraining method that models the diverse ways in which an image can be captioned! 📜arxiv.org/abs/2405.00740 🧵👄