Parth Asawa
@pgasawa
CS PhD student @Berkeley_EECS
Instruct-tuned models are getting better at following instructions and ‘reasoning’ every day, but they’re shockingly poor at generating diverse responses. Diversity is crucial to many tasks like synthetic data generation. We tackle this with a new approach, BARE 🐻! (1/n)

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
Ember: an inference-time scaling architecture framework 🧵 (1/8)
If you’re an undergrad at Berkeley, this is one of the best communities of inspiring, kind, and generally awesome people you can join!
A simple idea to build the @UCBerkeley startup alumni network has grown beyond my wildest dreams into #AccelScholars, a tight-knit community of the most ambitious, talented, kind-hearted people, whose individual stories we’ve been fortunate to support for the past eight years
LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇
We have a new article in the IEEE Data Engineering bulletin! Current data systems approach LLMs passively—treating them as costly black boxes operating on inputs as given, similar to expensive UDFs alongside relational operators. These systems execute commands but miss…
We asked Stanford students their best ML pickup lines 💌 for Valentine's Day 💝: Check out the full dataset 🔍: ml-valentines.github.io (wonderfully compiled by my friend @michelllepan) Here are some of the best ones (along with some doodles I made to match!)
Some of the most exciting AI apps require LLM reasoning over large datasets at test time. For these types of NL questions, RAG or Text2SQL + your favorite LLM are simply not enough. Excited to announce our new leaderboard, from the TAG team at Stanford and Berkeley, to…
This was a really fun collab. I think data diversity is a really fundamental problem! Big shout out to my amazing coauthors @pgasawa @aczhu1326 @jaredq_ @ChenLingjiao @matei_zaharia @profjoeyg and Ion Stoica
Instruct-tuned models are getting better at following instructions and ‘reasoning’ every day, but they’re shockingly poor at generating diverse responses. Diversity is crucial to many tasks like synthetic data generation. We tackle this with a new approach, BARE 🐻! (1/n)
congrats Parth!! creating good synthetic data is an important problem for people fine-tuning LLMs...cool insight that you can simply sample from base models and refine samples with instruct-tuned models also this is an exciting moment for me--I can now say that I've mentored a…
Instruct-tuned models are getting better at following instructions and ‘reasoning’ every day, but they’re shockingly poor at generating diverse responses. Diversity is crucial to many tasks like synthetic data generation. We tackle this with a new approach, BARE 🐻! (1/n)
BARE is a useful procedure for both practitioners working on distillation and those working to expand the state-of-the-art capabilities frontier! Increasingly, compute is being employed not only to consume data but also to produce it. Synthetic data generation, including the…
Instruct-tuned models are getting better at following instructions and ‘reasoning’ every day, but they’re shockingly poor at generating diverse responses. Diversity is crucial to many tasks like synthetic data generation. We tackle this with a new approach, BARE 🐻! (1/n)
Excited to share my first work of graduate school! BARE is a novel method for generating diverse, high-quality synthetic datasets, leveraging the diversity of base models and quality of instruct-tuned models. Check out the thread and feel free to reach out to @pgasawa and myself!
Instruct-tuned models are getting better at following instructions and ‘reasoning’ every day, but they’re shockingly poor at generating diverse responses. Diversity is crucial to many tasks like synthetic data generation. We tackle this with a new approach, BARE 🐻! (1/n)