Gregor Bachmann

@GregorBachmann1

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. http://gregorbachmann.github.io

Joined May 2022

372Following

361Followers

Pinned

Gregor Bachmann@GregorBachmann1 · Oct 3, 2023

Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version arxiv.org/abs/2306.13575 ! @SAnagnostidis and I managed to push performance even further 🔥

AAK@_akhaliq · Jun 26, 2023

Scaling MLPs: A Tale of Inductive Bias paper page: huggingface.co/papers/2306.13… In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are…

20.0K

Pinned

Gregor Bachmann Retweeted

Enis Simsar@enisimsar · Dec 13

🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌

7.0K

Gregor Bachmann Retweeted

Vaishnavh Nagarajan@_vaishnavh · Jul 16

Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)

6.0K

Gregor Bachmann Retweeted

Ayça Takmaz@aycatakmaz · Jul 10

Can we learn to complete anything in Lidar without any manual supervision? Excited to share our #ICML2025 paper “Towards Learning to Complete Anything in Lidar” from my time at @nvidia with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @AljosaOsep @lealtaixe! Thread🧵👇

4.0K

Gregor Bachmann Retweeted

Edward Milsom@edward_milsom · Jun 13

What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. arxiv.org/abs/2302.12091

224

268

18.0K

Gregor Bachmann@GregorBachmann1 · Jun 12

Our workshop on open-world 3D scene understanding OpenSUN3D is taking place this afternoon at @CVPR!

EElisabetta Fedele@efedele16 · Jun 12

Join us at OpenSUN3D☀️ workshop this afternoon @CVPR 🚀 📍: Room 105 A 🕰️: 2:00-6:00 pm 🌍: opensun3d.github.io @afshin_dn @leto__jean @lealtaixe

3.0K

Gregor Bachmann Retweeted

Vaishnavh Nagarajan@_vaishnavh · Jun 2

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

165

112

27.0K

Gregor Bachmann@GregorBachmann1 · May 22

Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons x.com/NasosGer/statu…

AAnastasios Gerontopoulos@NasosGer · May 20

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

905

Gregor Bachmann Retweeted

Vaishnavh Nagarajan@_vaishnavh · May 13

Hey @francoisfleuret, we had formalized this very intuition here in this late-2023 work you may be interested in :-) arxiv.org/abs/2403.06963

605

Gregor Bachmann@GregorBachmann1 · Apr 17

Thanks @_akhaliq for sharing! During my internship at @NVIDIAAI, we explored zero-shot panoptic completion of Lidar scans — together with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @lealtaixe @AljosaOsep!

AAK@_akhaliq · Apr 17

Nvidia just announced Towards Learning to Complete Anything in Lidar

13.0K

Gregor Bachmann Retweeted

AK@_akhaliq · Apr 17

Nvidia just announced Towards Learning to Complete Anything in Lidar

416

188

62.0K

Gregor Bachmann Retweeted

Dimitri von Rütte@dvruette · Mar 10

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

161

1.0K

928

138.0K

Gregor Bachmann Retweeted

Ayça Takmaz@aycatakmaz · Jan 16

I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see zurichai.ch/events/zurichc… for additional details!

4.0K

Gregor Bachmann@GregorBachmann1 · Dec 24

Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚 opensun3d.github.io

FFrancis Engelmann@FrancisEngelman · Dec 24

Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ opensun3d.github.io We will be hosting: - prized challenge 🏆 (see scenefun3d.github.io) - paper track 🗞️ - exciting keynote speakers 👩‍🏫 #CVPR2025

1.0K

Gregor Bachmann Retweeted

Tiago Pimentel@tpimentelms · Dec 20

BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :) arxiv.org/abs/2412.15210

429

265

35.0K

Gregor Bachmann@GregorBachmann1 · Dec 13

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

BBobby@bobby_he · Nov 8

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

5.0K

Gregor Bachmann@GregorBachmann1 · Sep 27

We have an exciting line-up of keynote speakers at our workshop for open-vocabulary 3D scene understanding, OpenSUN3D☀️ at #ECCV2024! 🗓️Sept 29, Sunday 14:00-17:30 ✍️opensun3d.github.io @meinhardt_tim @orlitany @AlexBewleyAI @_krishna_murthy

FFrancis Engelmann@FrancisEngelman · Sep 27

Introducing our Keynote Speakers at this edition of the OpenSUN3D workshop #ECCV2024 (Sept 29, Sunday 14:00-15:30, Room: Amber 4) in Milano🇮🇹 Full schedule: opensun3d.github.io 🚀 @eccvconf @ETH_en @ETH_AI_Center @Stanford

3.0K

Gregor Bachmann Retweeted

andrea panizza@unsorsodicorda · Aug 18

This is really nice! But the proof is very general and thus complicated. A simpler proof, together with a proof of what can go wrong when learning these next-token predictors with MLE, is given in this (IMHO underrated) paper arxiv.org/pdf/2403.06963 @GregorBachmann1 @_vaishnavh

857

Gregor Bachmann@GregorBachmann1 · Jul 26, 2024

come to the poster session at 12pm and our spotlight presentation at 3pm, both in Straus 3!

AAlex Hägele@haeggee · Jul 22, 2024

I'm also at ICML -- excited to present our paper on training + LR schedules as a spotlight (!) at the workshop on the next gen of seq. models as well as ES-FOMO on Fri🤙 Reach out to discuss methods for training open models, scaling, efficiency, or the future of architectures :)

1.0K

Gregor Bachmann@GregorBachmann1 · Jul 25, 2024

We’re presenting our work on concept guidance today at 13:30’s ICML poster session (# 706). Come by and say hi! #ICML #ICML2024

DDimitri von Rütte@dvruette · Feb 23, 2024

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 arxiv.org/abs/2402.14433

816

Gregor Bachmann@GregorBachmann1 · Jul 23, 2024

Join us today at 13.30 in #ICML to learn how to navigate across scaling laws and how to accelerate your training! Poster #1007

SSotiris Anagnostidis@SAnagnostidis · Nov 9, 2023

Scaling laws predict the minimum required amount of compute to reach a given performance, but can we do better? Yes, if we allow for a flexible "shape" of the model! 🤸

961