Avi Schwarzschild
@A_v_i__S
Trying to learn about deep learning faster than deep learning can learn about me.
At #ICML2025, I am super excited to introduce STAMP. This is a marriage b/w dataset inference & watermarking that finally(!) lets creators PROVE their content was used to train LLMs🔍 Its a MAJOR push taking the academic problem into real world. w/Saksham Rastogi @danish037 🧵
I will talk about how to train agents with decision making capabilities that generalize to completely new environments: x.com/FahimTajwar10/…
Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.
We're now merging code edits at 4300 tok/s, over 2x faster than the Llama 70b deployment on Cerebras. docs.relace.ai/docs/instant-a…
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,…
Mean Flows for One-step Generative Modeling "We introduce the notion of average velocity to characterize flow fields, in contrast to instantaneous velocity modeled by Flow Matching methods. A well-defined identity between average and instantaneous velocities is derived and…
I find it interesting that people who believe LLMs/autoregressive models are a dead end base their arguments, and reasoning, either on philosophical, hard to test or rebut hypotheses, or on micro failures, eg 9.11 vs 9.9, to predict paradigm macro failures. All the while the…
📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!
Looking forward to giving a talk this Friday @OpenAI with @zhilifeng on some of our privacy & memorization research + how it applies to production LLMs! We've been gaining momentum on detecting, quantifying & erasing memorization; excited to explore its real-world impact!
I'm very excited to talk about compression-based memorization with @pratyushmaini this Friday at the @OpenAI Security Research Conference! Let's chat about compression, memorization, and also our new antidistillation sampling antidistillation.com!
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
Why do larger language models generalize better? In our new ICLR paper, we derive an interpretable generalization bound showing that compute-optimal LLMs provably generalize better with scale! 📄arxiv.org/abs/2504.15208 1/7🧵