Dmitry Krotov
@DimaKrotov
I am a physicist working on neural networks and machine learning, @MITIBMLab @IBMResearch. Formerly: @the_IAS, @Princeton
Recent advances in Hopfield networks of associative memory may be the guiding theoretical principle for designing novel large scale neural architectures. I explain my enthusiasm about these ideas in the article ⬇️⬇️⬇️. Please let me know what you think. nature.com/articles/s4225…
I am frequently asked about the difference between binary and continuous Hopfield networks. Binary networks operate on discrete spins that are flipped in random order, continuous ones are described by differential equations and continuous state vectors. What is the right way to…

📢 𝐂𝐚𝐥𝐥 𝐟𝐨𝐫 𝐏𝐚𝐩𝐞𝐫𝐬 – 𝐌𝐞𝐦𝐕𝐢𝐬 @ ICCV | 𝐇𝐨𝐧𝐨𝐥𝐮𝐥𝐮, 𝐇𝐚𝐰𝐚𝐢𝐢 🌺 Topics: 🧠 Memory-augmented models 🎥 Temporal & long-context vision 🤖 Multimodal & scalable systems and more on 𝐦𝐞𝐦𝐨𝐫𝐲 + 𝐯𝐢𝐬𝐢𝐨𝐧 ... 👉OpenReview Submission:…
How to build a factual but creative system? It is a question surrounding memory and creativity in modern ML systems. My colleagues from @IBMResearch and @MITIBMLab are hosting the @MemVis_ICCV25 workshop at #ICCV2025, which explores the intersection between memory and generative…
Consistency Variational Autoencoders (CoVAE) follow naturally from β-VAEs. A family of β-VAEs (with increasing β) can be organized as a sequence of latent encodings with decreasing SNR . This implicit definition of a 'forward process' is used to define a consistency-style loss!
🔥 M+ is at #ICML2025 now! We combine long-term memory (on CPU) with short-term memory (on GPU) for LLMs, pushing efficient long-context modeling to 160k+ tokens. 📍 My co-author @YzhuML will present M+ in person tomorrow 4:30pm. 👋 Come chat about scaling memory for LLMs!
🎉 Our paper “M+: Extending MemoryLLM with Scalable Long-Term Memory” is accepted to ICML 2025! 🔹 Co-trained retriever + latent memory 🔹 Retains info across 160k+ tokens 🔹 Much Lower GPU cost compared to backbone LLM arxiv.org/abs/2502.00592
Thanks everyone who came to our Tutorial yesterday. It was fun! I will host an additional Q&A session at the IBM Research booth in the West Exhibition Hall A today between 4.30pm and 6pm. If you want to chat about Associative Memory, Energy Transformers, diffusion models, AI &…

During training, diffusion models are being taught to be effective denoisers, like Associative Memory systems. At what point do these models stop being denoisers and behaving like data generators? To learn about how these models arise from being Associative Memory systems to…
Energy-based modeling is keen on learning a target data distribution. But, by using the rules of modern Hopfield networks, it can be used to design novel dynamical neural systems whose dynamics are dictated by a global energy function operating in a latent space. For example, we…
Memory is a fundamental aspect of human cognition, yet current state-of-the-art AI models use only its rudimentary forms. Join us at the ICCV 2025 workshop on Memory & Vision in sunny Honolulu, where we will explore the intersection of memory and visual AI. We invite you to…
MemVis @ #ICCV2025 -- 1st Workshop on Memory & Vision! 🧠👁️ Call for papers now open: Hopfield & energy nets, state-space + diffusion models, retrieval & lifelong learning, long-context FMs, multimodal memory, & more. 🗓️ Submit by 1 Aug 2025 → sites.google.com/view/memvis-ic… 🌺 #MemVis
Much has been said about memorization and generalization in diffusion models. What remains overlooked is that there is an entirely new phase, in addition to the two above - spurious states. The existence of this phase is a distinctive prediction of the energy-based associative…
Diffusion models create novel images, but they can also memorize samples from the training set. How do they blend stored features to synthesize novel patterns? Our new work shows that diffusion models behave like Dense Associative Memory: in the low training data regime (number…
1/2) It's finally out on Arxiv: Feedback guidance of generative diffusion models! We derived an adaptive guidance methods from first principles that regulate the amount of guidance based on its current state. Complex prompts are highly guided while simplem ones are almost free