Bhuwan Dhingra

@bhuwandhingra

Natural Language Processing / Machine Learning research. Assistant Professor @dukecompsci, @duke_nlp; Research Scientist @Apple

Durham, NC

Joined May 2014

319Following

1KFollowers

Pinned

Bhuwan Dhingra Retweeted

Tian Li@litian0331 · Jun 8

Want to train LLMs with less cost? We introduce BiClip, a clipping-based method that `approximates' adaptive optimizers without maintaining expensive preconditioners

166

115

20.0K

Bhuwan Dhingra@bhuwandhingra · Jul 17

The technical report for the second generation of Apple Foundation Models is out.. Its been a great year contributing to this effort and being part of an amazing team!

RRuoming Pang@ruomingpang · Jul 17

In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. machinelearning.apple.com/research/apple…

1.0K

Bhuwan Dhingra Retweeted

Ruoming Pang@ruomingpang · Jun 9

At WWDC we introduce a new generation of LLMs developed to enhance the Apple Intelligence features. We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device foundation language model. machinelearning.apple.com/research/apple…

110

498

194

77.0K

Bhuwan Dhingra@bhuwandhingra · Jun 4

Backtracking allows reasoning models to go back and correct mistakes in their solution attempts. What sorts of tasks benefit from this behavior? And can we boost it using SFT? @Hongyicai2002 's new preprint answers these questions and more -- check it out!

HHongyi James Cai@Hongyicai2002 · Jun 4

🚀Excited to share our new paper: How Much Backtracking is Enough? 🤔 How many backtracks should your LLM learn to reason better? Turns out: the harder the task, the more backtracking you need!

422

Bhuwan Dhingra Retweeted

Together AI@togethercompute · May 28

🚀 Introducing Mixture-of-Agents Alignment (MoAA), a new method to "distill" the collective intelligence of open-source LLMs into a single, efficient model. MoAA outperforms GPT-4o as a teacher, boosting smaller models like Llama3.1-8B to rival models 10x their size!

5.0K

Bhuwan Dhingra@bhuwandhingra · May 28

Looking forward to visiting Stanford this Thursday! Check out my talk at the NLP seminar if you’re around :)

SStanford NLP Group@stanfordnlp · May 27

For this week’s NLP Seminar, we are thrilled to host @bhuwandhingra to talk about Certainty in the face of ambiguity! When: 5/29 Thurs 11am PT Non-Stanford affiliates registration form (closed at 9am PT on the talk day): forms.gle/t7HtZ9fvWccEKF…

8.0K

Bhuwan Dhingra@bhuwandhingra · May 27

Happy to share my first paper at Apple, led by @RoyXie_. TL; DR: Interleaving <think> and <answer> blocks during reasoning reduces the time-to-first-token *and* improves accuracy.

RRoy Xie@RoyXie_ · May 27

Can we train reasoning LLMs to generate answers as they think? Introducing 𝐈𝐧𝐭𝐞𝐫𝐥𝐞𝐚𝐯𝐞𝐝 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠! We train LLMs to alternate between thinking & answering 🚀 Reducing Time-to-First-Token (TTFT) by over 80% ⚡AND improving Pass@1 accuracy up to 19.3%!📈 🧵 1/n

631

Bhuwan Dhingra@bhuwandhingra · May 21

Tagging @maxiholsman who led this great work!

BBhuwan Dhingra@bhuwandhingra · May 21

Glad to share a new ACL Findings paper from @MaxHolsman and @YukunHuang9! We introduce Fuzzy Speculative Decoding (FSD) which extends speculative decoding to allow a tunable exchange of generation quality and inference acceleration. Paper: arxiv.org/abs/2502.20704

447