Gregor Bachmann
@GregorBachmann1
I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. http://gregorbachmann.github.io
Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version arxiv.org/abs/2306.13575 ! @SAnagnostidis and I managed to push performance even further 🔥
Scaling MLPs: A Tale of Inductive Bias paper page: huggingface.co/papers/2306.13… In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are…
🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌
Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)
Can we learn to complete anything in Lidar without any manual supervision? Excited to share our #ICML2025 paper “Towards Learning to Complete Anything in Lidar” from my time at @nvidia with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @AljosaOsep @lealtaixe! Thread🧵👇
What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. arxiv.org/abs/2302.12091
Our workshop on open-world 3D scene understanding OpenSUN3D is taking place this afternoon at @CVPR!
Join us at OpenSUN3D☀️ workshop this afternoon @CVPR 🚀 📍: Room 105 A 🕰️: 2:00-6:00 pm 🌍: opensun3d.github.io @afshin_dn @leto__jean @lealtaixe
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons x.com/NasosGer/statu…
1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful
Hey @francoisfleuret, we had formalized this very intuition here in this late-2023 work you may be interested in :-) arxiv.org/abs/2403.06963
Thanks @_akhaliq for sharing! During my internship at @NVIDIAAI, we explored zero-shot panoptic completion of Lidar scans — together with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @lealtaixe @AljosaOsep!
Nvidia just announced Towards Learning to Complete Anything in Lidar
Nvidia just announced Towards Learning to Complete Anything in Lidar
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see zurichai.ch/events/zurichc… for additional details!
Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚 opensun3d.github.io
Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ opensun3d.github.io We will be hosting: - prized challenge 🏆 (see scenefun3d.github.io) - paper track 🗞️ - exciting keynote speakers 👩🏫 #CVPR2025
BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :) arxiv.org/abs/2412.15210
Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!
Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!
We have an exciting line-up of keynote speakers at our workshop for open-vocabulary 3D scene understanding, OpenSUN3D☀️ at #ECCV2024! 🗓️Sept 29, Sunday 14:00-17:30 ✍️opensun3d.github.io @meinhardt_tim @orlitany @AlexBewleyAI @_krishna_murthy
Introducing our Keynote Speakers at this edition of the OpenSUN3D workshop #ECCV2024 (Sept 29, Sunday 14:00-15:30, Room: Amber 4) in Milano🇮🇹 Full schedule: opensun3d.github.io 🚀 @eccvconf @ETH_en @ETH_AI_Center @Stanford
This is really nice! But the proof is very general and thus complicated. A simpler proof, together with a proof of what can go wrong when learning these next-token predictors with MLE, is given in this (IMHO underrated) paper arxiv.org/pdf/2403.06963 @GregorBachmann1 @_vaishnavh
come to the poster session at 12pm and our spotlight presentation at 3pm, both in Straus 3!
I'm also at ICML -- excited to present our paper on training + LR schedules as a spotlight (!) at the workshop on the next gen of seq. models as well as ES-FOMO on Fri🤙 Reach out to discuss methods for training open models, scaling, efficiency, or the future of architectures :)
We’re presenting our work on concept guidance today at 13:30’s ICML poster session (# 706). Come by and say hi! #ICML #ICML2024
🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 arxiv.org/abs/2402.14433
Join us today at 13.30 in #ICML to learn how to navigate across scaling laws and how to accelerate your training! Poster #1007
Scaling laws predict the minimum required amount of compute to reach a given performance, but can we do better? Yes, if we allow for a flexible "shape" of the model! 🤸