Bhuwan Dhingra
@bhuwandhingra
Natural Language Processing / Machine Learning research. Assistant Professor @dukecompsci, @duke_nlp; Research Scientist @Apple
Want to train LLMs with less cost? We introduce BiClip, a clipping-based method that `approximates' adaptive optimizers without maintaining expensive preconditioners
The technical report for the second generation of Apple Foundation Models is out.. Its been a great year contributing to this effort and being part of an amazing team!
In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. machinelearning.apple.com/research/apple…
At WWDC we introduce a new generation of LLMs developed to enhance the Apple Intelligence features. We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device foundation language model. machinelearning.apple.com/research/apple…
Backtracking allows reasoning models to go back and correct mistakes in their solution attempts. What sorts of tasks benefit from this behavior? And can we boost it using SFT? @Hongyicai2002 's new preprint answers these questions and more -- check it out!
🚀Excited to share our new paper: How Much Backtracking is Enough? 🤔 How many backtracks should your LLM learn to reason better? Turns out: the harder the task, the more backtracking you need!
🚀 Introducing Mixture-of-Agents Alignment (MoAA), a new method to "distill" the collective intelligence of open-source LLMs into a single, efficient model. MoAA outperforms GPT-4o as a teacher, boosting smaller models like Llama3.1-8B to rival models 10x their size!
Looking forward to visiting Stanford this Thursday! Check out my talk at the NLP seminar if you’re around :)
For this week’s NLP Seminar, we are thrilled to host @bhuwandhingra to talk about Certainty in the face of ambiguity! When: 5/29 Thurs 11am PT Non-Stanford affiliates registration form (closed at 9am PT on the talk day): forms.gle/t7HtZ9fvWccEKF…
Happy to share my first paper at Apple, led by @RoyXie_. TL; DR: Interleaving <think> and <answer> blocks during reasoning reduces the time-to-first-token *and* improves accuracy.
Can we train reasoning LLMs to generate answers as they think? Introducing 𝐈𝐧𝐭𝐞𝐫𝐥𝐞𝐚𝐯𝐞𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠! We train LLMs to alternate between thinking & answering 🚀 Reducing Time-to-First-Token (TTFT) by over 80% ⚡AND improving Pass@1 accuracy up to 19.3%!📈 🧵 1/n
Tagging @maxiholsman who led this great work!
Glad to share a new ACL Findings paper from @MaxHolsman and @YukunHuang9! We introduce Fuzzy Speculative Decoding (FSD) which extends speculative decoding to allow a tunable exchange of generation quality and inference acceleration. Paper: arxiv.org/abs/2502.20704