Ritvik Kapila

@RitvikKapila

ML Research @Essential_AI, MS CS @UCSanDiego, B. Tech. @iitdelhi

San Francisco, CA, US

Joined December 2022

196Following

177Followers

Pinned

Ritvik Kapila@RitvikKapila · Jun 20

#1 trending on @huggingface letsgoooo! @essential_ai 🥇

71.0K

Ritvik Kapila@RitvikKapila · Jun 25

Exciting work by @gauri__gupta and team — pushing the boundaries of agentic AI with a grounded, evidence-based system that attributes sources and quantifies confidence.

GGauri Gupta@gauri__gupta · Jun 24

Excited to share what we've been building at Parallel Web Systems with @paraga - an evidence-driven response grounding system that sets new standards for agentic proof of work. [1/n]

213

Ritvik Kapila@RitvikKapila · Jun 24

Check out our optimizer team’s findings on grokking with Muon vs ADAM. Our experiments show that different settings favour different optimizers, with no clear wins for either. Blog: essential.ai/blog/grokking @essential_ai @ssingla17 @ishaankshah @ashVaswani

AAshish Vaswani@ashVaswani · Jun 23

We didn't see clear wins on grokking with Muon. Curious if others have observed similar behaviors.

847

Ritvik Kapila Retweeted

Essential AI@essential_ai · Jun 23

[1/5] We have a quick update to share, which contradicts our hypothesis regarding the abilities of Muon and Adam vis-a-vis Grokking.

120

41.0K

Ritvik Kapila Retweeted

Essential AI@essential_ai · Jun 22

Why run the same race when we can pioneer our own path? Thats how we approach AI, by taking big bets and pushing on the foundations of AI 💥 Check out @ashVaswani's recent interview with @EconomicTimes

27.0K

Ritvik Kapila Retweeted

Essential AI@essential_ai · Jun 18

[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!

297

217

197.0K

Ritvik Kapila@RitvikKapila · Jun 18

Check out our recent work at @essential_ai; Essential-Web v1.0- a web scale corpus of 24T tokens which we find useful to curate high-performing domain specific datasets for LLM pre-training. Paper link: arxiv.org/abs/2506.14111 cc @AndrewHojel @timr1126 @YashVanjani @ashVaswani

AAshish Vaswani@ashVaswani · Jun 18

Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!

998

Ritvik Kapila@RitvikKapila · May 19

Check out our infrastructure team’s work at @essential_ai on parallelizing Muon on large scale distributed clusters. Blog link: essential.ai/blog/infra @YashVanjani @pcmonk @ishaankshah @karlstratos @ashVaswani

220

Ritvik Kapila@RitvikKapila · May 12

New blog post out on evaluating the benefits of the second-order optimizer Muon over AdamW. Do check it out! @essential_ai @ishaankshah @ampolloreno @karlstratos @pcmonk @ashVaswani

EEssential AI@essential_ai · May 12

🗞️Check out our latest blog post, where we evaluate Muon, a second-order optimizer that delivers strong compute-time efficiency at large batch sizes. Combined with muP, Muon offers a simple path to faster, more scalable LLM pretraining. 🧠Link to blog: essential.ai

453

Ritvik Kapila@RitvikKapila · May 6

Check out our work at @essential_ai on second-order optimizers! We show that Muon reaches the target loss with 10–15% fewer tokens than AdamW, and also enables faster training with more devices. Paper: arxiv.org/abs/2505.02222

AAshish Vaswani@ashVaswani · May 6

Please check out our thorough study on the advantages of Muon. Second-order optimization is a promising path to more efficient LLM pretraining.

188

Ritvik Kapila@RitvikKapila · Apr 8

We, at @essential_ai, investigated the emergence of reflection capabilities in LLMs during pre-training. Check out our new paper at arxiv.org/abs/2504.04022.

EEssential AI@essential_ai · Apr 8

We have been studying how pre-training and reasoning interact. Check out our latest results!!

208

Ritvik Kapila@RitvikKapila · Apr 16, 2023

Check out our work on "Domain Generalization in Robust Invariant Representation" which got accepted in ICLR'23 workshop PML4DC! Link to open source code: github.com/GauriGupta19/D… Arxiv link: arxiv.org/abs/2304.03431 #MachineLearning #DomainGeneralization #InvariantRepresentation

423