Samuel Lavoie
@lavoiems
PhD candidate @Mila_quebec, @UMontreal. Ex: FAIR @AIatMeta. Learning representations and minimizing free energy.
🧵 Everyone is chasing new diffusion models—but what about the representations they model from? We introduce Discrete Latent Codes (DLCs): - Discrete representation for diffusion models - Uncond. gen. SOTA FID (1.59 on ImageNet) - Compositional generation - Integrates with LLM 🧱

As I'm heading out of Vancouver, I'm also wrapping up my postdoc at FAIR. It was a fun week with my friends/collabs and I made many great connections. But, hey! I'm still looking for full-time positions. Reach out if you work on multimodal generation/understanding #NeurIPS2024
We're releasing a cool paper! DLCs are image tokens that enable better diffusion modelling. For now, we show this is the right representation. But in the future, this can allow LLMs to "speak in images"🤯to enable visual reasoning and more powerful text-image generalization. ⬇️
🧵 Everyone is chasing new diffusion models—but what about the representations they model from? We introduce Discrete Latent Codes (DLCs): - Discrete representation for diffusion models - Uncond. gen. SOTA FID (1.59 on ImageNet) - Compositional generation - Integrates with LLM 🧱
The code and model weights for this paper are finally open! Despite being a little late for releasing them, I hope you will find them useful! Code: github.com/facebookresear… Models: - (ViT-G): huggingface.co/lavoies/llip-v… - (ViT-B): huggingface.co/lavoies/llip-v…
Should we account for the diverse ways that an image can be captioned? In our #ICML2024 paper. We propose Llip — a Vision Language Pretraining method that models the diverse ways in which an image can be captioned! 📜arxiv.org/abs/2405.00740 🧵👄
Happy to share our latest work on #diffusion models without data: building theoretical bridges between existing methods, analysing their continuous-time asymptotics, and showing some cool practical implications. arxiv.org/abs/2501.06148 #MachineLearning 1/9
Stick-Breaking Attention: Out-of-box length extrapolation, thanks to removing the position embedding; Better performance than Softmax+RoPE on almost every task; Similar efficient implementation like Flash Attention. Do we still need Softmax+RoPE for Language Models?…
I can't believe that no one has done that yet!
Is your RLHF training too slow? 🦥 Does inefficient LLM generation get you down? We propose a new paradigm: Asynchronous RLHF! It's faster, more efficient, and achieves the same perf as SOTA methods only improving with scale! arxiv.org/abs/2410.18252 and we release code too
We're looking for a postdoc to work with us in FAIR Montreal @AIatMeta. Interested in building generative visual models of the world and leveraging them to train dowsntream ML models? Apply: metacareers.com/jobs/376087892… cc:@hall__melissa @ReyhaneAskari @JakobVerbeek @michal_drozdzal
Some thoughts on how to think about "world models" in language models and beyond: lingo.csail.mit.edu/blog/world_mod…
This is happening today! Will be presenting at 1h30pm CEST, poster #317 Feel free to drop by to say hi
I will be at #ICML2024 next week to present WebLINX, a benchmark for training and evaluating web agents on 150+ real-world websites. It was used in the @webllama project to train Llama-3-8B-Web! DM me if you are interested in connecting. Poster page: icml.cc/virtual/2024/p…