Jonas Pfeiffer
@PfeiffJo
Research Scientist @GoogleDeepMind | @AdapterHub | previously @nyuniversity @TUDarmstadt @UKPLab @MetaAI @spotify | http://pfeiffer.ai | (he/him)
Today we finally release our survey paper, spanning multiple axes of modular deep learning, covering topics like - parameter-effic. architectures - routing functions (MoEs, fixed,..) - aggregation functions combining information from diff modules and very many applications! 👇
In our new survey “Modular Deep Learning”, we provide a unified taxonomy of the building blocks of modular neural nets and connect disparate threads of research. 📄 arxiv.org/abs/2302.11529 📢 ruder.io/modular-deep-l… 🌐 modulardeeplearning.com w/ @PfeiffJo @licwu @PontiEdoardo
Thrilled about our new Adapters release!🎉I had a blast working on this version, especially contributing to the new plugin interface (like adding ModernBERT) and helping with the VeRA adapter method. Have a look at the full thread for all the awesome updates from our team 👇
🚀Adapters v1.2 is out!🚀 We've made Adapters incredibly flexible: Add adapter support to ANY Transformer architecture with minimal code! We used this to add 8 new models out-of-the-box, incl. ModernBERT, Gemma3 & Qwen3! Explore this +2 new adapter methods in this thread👇(1/5)
@GoogleDeepMind India 🇮🇳 & Japan 🇯🇵 are looking for strong candidates in multilinguality, multicultural, & multimodality areas. RS Bangalore: job-boards.greenhouse.io/deepmind/jobs/… RS Tokyo: job-boards.greenhouse.io/deepmind/jobs/… RE Tokyo: job-boards.greenhouse.io/deepmind/jobs/…
Check it out! The newest version of AdapterHub is on 🔥
🚀Adapters v1.2 is out!🚀 We've made Adapters incredibly flexible: Add adapter support to ANY Transformer architecture with minimal code! We used this to add 8 new models out-of-the-box, incl. ModernBERT, Gemma3 & Qwen3! Explore this +2 new adapter methods in this thread👇(1/5)
Hiring two student researchers for Gemma post-training team at @GoogleDeepMind Paris! First topic is about diversity in RL for LLMs (merging, generalization, exploration & creativity), second is about distillation (with @nino_vieillard). Ideal if you're finishing PhD. DMs open!
Highly recommend Jonas and his team!
I am hiring a Student Researcher for our Modularity team at the Google DeepMind office in Zurich🇨🇭 Please fill out the interest form if you would like to work with us! The role would start mid/end 2025 and would be in-person in Zurich with 80-100% at GDM forms.gle/N94ViTmKHCCAcv…
We've got plenty of exciting ideas flying around, so consider applying to carve them further with us!
I am hiring a Student Researcher for our Modularity team at the Google DeepMind office in Zurich🇨🇭 Please fill out the interest form if you would like to work with us! The role would start mid/end 2025 and would be in-person in Zurich with 80-100% at GDM forms.gle/N94ViTmKHCCAcv…
Jonas and the Zurich modularity team have been working on super exciting topics, i’d strongly recommend applying!
I am hiring a Student Researcher for our Modularity team at the Google DeepMind office in Zurich🇨🇭 Please fill out the interest form if you would like to work with us! The role would start mid/end 2025 and would be in-person in Zurich with 80-100% at GDM forms.gle/N94ViTmKHCCAcv…
I am hiring a Student Researcher for our Modularity team at the Google DeepMind office in Zurich🇨🇭 Please fill out the interest form if you would like to work with us! The role would start mid/end 2025 and would be in-person in Zurich with 80-100% at GDM forms.gle/N94ViTmKHCCAcv…
Put together a small demo with some fun examples of how you can use Gemma3’s new vision capability with multilinguality and reasoning!
Check out how to use Gemma as your own travel assistant with @GoogleDeepMind’s @ashkamath20, who led the multimodal effort on Gemma 3.
Congratulations to the whole Gemma team for the launch and especially @ashkamath20 who did an amazing job pushing the MM capability of the model 🚀. Give a try to the model 🔥
Super excited to announce what I’ve been working on for the past few months 💃 GEMMA 3 is out today! It supports 140+ languages, has a context length of 128k tokens and the best part? It’s natively multimodal! 📸
Super excited to announce what I’ve been working on for the past few months 💃 GEMMA 3 is out today! It supports 140+ languages, has a context length of 128k tokens and the best part? It’s natively multimodal! 📸
🎁 A new update of the Adapters library is out! Check out all the novelties, changes & fixes here: github.com/adapter-hub/ad…
Workshop alert 🚨 We'll host in ICLR 2025 a workshop on modularity, encompassing collaborative + decentralized + continual learning. Those topics are on the critical path to building better AIs. Interested? submit a paper and join us in Singapore! sites.google.com/corp/view/mcdc…
Thanks to our invited speakers @PfeiffJo @alisawuffles who delivered inspiring talks on Modular Deep Learning and decoding-time experts for language model adaptation. A heartfelt thank you to our sponsors @huggingface @SakanaAILabs @arcee_ai ! Making the competition possible.
I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: pp.events/felix
Paper #2: Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization Link: aclanthology.org/2024.mrl-1.7/ Ever wondered how we can do LLM weight arithmetic to enable models to handle tasks in languages in zero shot style? The authors have a solution.
I've missed out several technical details so please read the paper, it contains some interesting information nuggets. Overall another cool work by: @alexandraxron, @PfeiffJo, @maynez_joshua, @cindyxinyiwang, @seb_ruder, and @priyanka_17 Thanks for the cool paper!
Google presents Deliberation in Latent Space via Differentiable Cache Augmentation
Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency Researchers from Google DeepMind have introduced a method called Differentiable Cache Augmentation. This technique uses a trained coprocessor to…