Manuel Faysse
@ManuelFaysse
NLP (LLMs) & ML Privacy - ColPali 👀 - PhD Candidate @CentraleSupelec Prev: @imperialcollege, @epfl
🚨 Introducing "ColPali: Efficient Document Retrieval with Vision Language Models" ! We use Vision LLMs + late interaction to improve document retrieval (RAG, search engines, etc.), solely using the image representation of document pages ! arxiv.org/abs/2407.01449 🧵(1/N)
AudioRAG is becoming real! Just built a demo with ColQwen-Omni that does semantic search on raw audio, no transcription needed. Drop in a podcast, ask your question, and it finds the exact chunks where it happens. You can also get a written answer. What’s exciting: it skips…
all modality RAG 🔥 ColQwen-Omni is a new multimodal retrieval model that can retrieve anything (videos, audios, documents and more!) use with transformers 🤗 here's a smol demo on video retrieval ↙️
Amazing progress for Omni retrieval, this is really something that all the emerging notetaker apps needs.
Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late interaction to audio chunks and short videos, with no performance degradation on visual document retrieval wrt our best models! (1/N)
For which today is a good day to explore, as Manuel and team make omni late interaction a thing… x.com/manuelfaysse/s…
Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late interaction to audio chunks and short videos, with no performance degradation on visual document retrieval wrt our best models! (1/N)
DIY Robotics + 3D printing + RL + GPT4o + a sprinkle of poetry... My amazing friend @matthieulc has the mind of an artist in the body of one of the most complete engineers I know, and recently documented his last project in an amazing blogpost that everyone can learn a bit from!
Releasing Shoggoth Mini!🐙 Soft tentacle robot meets GPT-4o & RL. I built it to explore the boundaries of weird: expressiveness, aliveness, and AI embodiment. Blogpost: matthieulc.com/posts/shoggoth…
Awesome new lib from @bclavie to make MaxSim computation quicker on CPU ! In the same vibe, also need to underline pylate-rs by @raphaelsrty ! Both libs go a long way into making Late Interaction (ColPali, ColBert) ever more accessible !
New blog post & new library are out now! The BP is about MaxSim, why it's *orders of magnitude* much more demanding than normal cosine similarity, and why GPUs don't care, but CPUs do! The library is maxsim-cpu, which makes it so CPUs can be fast and play it cool, too.
More proprietary model providers should have to release OS models: since reputation is at stake, it forces the org to put in work into safety alignment they would never "need" to do if the public only has API access to a model/system that can be live patched a posteriori
we planned to launch our open-weight model next week. we are delaying it; we need time to run additional safety tests and review high-risk areas. we are not yet sure how long it will take us. while we trust the community will build great things with this model, once weights are…
Finding the smallest subset of a corpus that enables strong RAG performance with little priors over the future query distribution is a super impactful research problem. This cool work focuses on the somewhat relaxed task of optimizing retrieval to boost benchmarks that are well…
Reasoning benchmarks (e.g., MMLU Pro and GPQA) have seen little benefit from naive RAG. But can we flip this? 🔥Introducing CompactDS: ✅Web-scale coverage ✅Runs with just 100GB RAM ✅Matches search engines The simplest RAG pipeline can even compete with agentic…