Ritvik Kapila
@RitvikKapila
ML Research @Essential_AI, MS CS @UCSanDiego, B. Tech. @iitdelhi
#1 trending on @huggingface letsgoooo! @essential_ai 🥇

Exciting work by @gauri__gupta and team — pushing the boundaries of agentic AI with a grounded, evidence-based system that attributes sources and quantifies confidence.
Excited to share what we've been building at Parallel Web Systems with @paraga - an evidence-driven response grounding system that sets new standards for agentic proof of work. [1/n]
Check out our optimizer team’s findings on grokking with Muon vs ADAM. Our experiments show that different settings favour different optimizers, with no clear wins for either. Blog: essential.ai/blog/grokking @essential_ai @ssingla17 @ishaankshah @ashVaswani
We didn't see clear wins on grokking with Muon. Curious if others have observed similar behaviors.
[1/5] We have a quick update to share, which contradicts our hypothesis regarding the abilities of Muon and Adam vis-a-vis Grokking.
Why run the same race when we can pioneer our own path? Thats how we approach AI, by taking big bets and pushing on the foundations of AI 💥 Check out @ashVaswani's recent interview with @EconomicTimes
[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!
Check out our recent work at @essential_ai; Essential-Web v1.0- a web scale corpus of 24T tokens which we find useful to curate high-performing domain specific datasets for LLM pre-training. Paper link: arxiv.org/abs/2506.14111 cc @AndrewHojel @timr1126 @YashVanjani @ashVaswani
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
Check out our infrastructure team’s work at @essential_ai on parallelizing Muon on large scale distributed clusters. Blog link: essential.ai/blog/infra @YashVanjani @pcmonk @ishaankshah @karlstratos @ashVaswani
New blog post out on evaluating the benefits of the second-order optimizer Muon over AdamW. Do check it out! @essential_ai @ishaankshah @ampolloreno @karlstratos @pcmonk @ashVaswani
🗞️Check out our latest blog post, where we evaluate Muon, a second-order optimizer that delivers strong compute-time efficiency at large batch sizes. Combined with muP, Muon offers a simple path to faster, more scalable LLM pretraining. 🧠Link to blog: essential.ai
Check out our work at @essential_ai on second-order optimizers! We show that Muon reaches the target loss with 10–15% fewer tokens than AdamW, and also enables faster training with more devices. Paper: arxiv.org/abs/2505.02222
Please check out our thorough study on the advantages of Muon. Second-order optimization is a promising path to more efficient LLM pretraining.
We, at @essential_ai, investigated the emergence of reflection capabilities in LLMs during pre-training. Check out our new paper at arxiv.org/abs/2504.04022.
We have been studying how pre-training and reasoning interact. Check out our latest results!!
Check out our work on "Domain Generalization in Robust Invariant Representation" which got accepted in ICLR'23 workshop PML4DC! Link to open source code: github.com/GauriGupta19/D… Arxiv link: arxiv.org/abs/2304.03431 #MachineLearning #DomainGeneralization #InvariantRepresentation