Aparna Dhinakaran

@aparnadhinak

AI Founder: building @arizeai & @arizephoenix 💙 I post about LLMs, LLMOps, Generative AI, ML and occasionally Amazing Race

Berkeley, CA

Joined February 2013

1KFollowing

6KFollowers

Pinned

Aparna Dhinakaran@aparnadhinak · Jul 18

Reinforcement Learning in English – Prompt Learning Beyond just Optimization @karpathy tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture. We believe that the industry chasing traditional RL is…

AAndrej Karpathy@karpathy · Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

152

1.0K

155.0K

Pinned

Aparna Dhinakaran@aparnadhinak · Jul 22

It's soooo important to actually *look at your data* before jumping before solutions and evals. One of the questions we ask ourselves as an evals platform is how do we best enable teams to look and review their data? Annotation tools are great, but sometimes building your own…

ssanjana@sanjanayed · Jul 22

Just wrapped up a tutorial - I use a custom annotations tool to build an end-to-end evaluation & experimentation pipeline🚀 Inspired by an article from @eugeneyan, I explore how to leverage annotations to construct evals, design thoughtful experiments, and systematically improve…

890

Aparna Dhinakaran@aparnadhinak · Jul 24

evals are all you need

TTanishq Abraham back from ICML@iScienceLuvr · Jul 24

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

394

181

33.0K

Aparna Dhinakaran Retweeted

Gheorghe Iuga@gheorgheiuga · Jul 20

This was such a great episode! I love it how YouTube surfaces a gem like this now and then. Loved listening to both @_amankhan and @lennysan

642

Aparna Dhinakaran Retweeted

Chip Huyen@chipro · Jul 16

I open sourced Sniffly, a tool that analyzes Claude Code logs to help me understand my usage patterns and errors. Key learnings. 1. The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I…

122

1.0K

142.0K

Aparna Dhinakaran Retweeted

Deedy@deedydas · Jul 16

Most important tech blog this year: OpenAI engineer and ex-founder of $3.5B Segment wrote a tell all post about how OpenAI works internally. From obsession with X, devout use of Slack to engineering culture and tech stack. A peek under the hood of a generational company.

364

4.0K

6.0K

507.0K

Aparna Dhinakaran Retweeted

Andrej Karpathy@karpathy · Jul 13

412

854

8.0K

5.0K

1.0M

Aparna Dhinakaran@aparnadhinak · Jul 10

The secret to prompt optimization is evals Saw this tweet by Jason Liu and got me thinking about the future of prompt optimization Most of us are in Cursor/Claude Code and it makes a ton of sense to keep prompts close to code and iterate on them with AI code editors The hard…

jjason liu@jxnlco · Jul 5

holy shit lmfao claude code has been writing a prompt, looking at 200 failures and updated the prmopt, it went from v1 recall@1 60 -> 80 v2 recall@1 6 -> 52

4.0K

Aparna Dhinakaran@aparnadhinak · Jul 8

One of our most enthusiastic students Pawel took the evals FAQ and upgraded it 😍 check it out

PPaweł Huryn@PawelHuryn · Jul 8

I got permission to publish this massive EI Evals FAQ (PDF). It's like a bible for AI engineers and AI PMs. @HamelHusain and @sh_reya answer the most common questions they got while teaching 700+ students. And share 30+ free videos, posts, and resources: 🧵

5.0K

Aparna Dhinakaran Retweeted

Mikyo@mikeldking · Jul 9

Trying to come up with the manifesto for an OSS evals library. Initial thoughts: •Speed - Speed should be a distinct advantage of using these evals over others. This may come at a trade-off of accuracy at times and should be weighed but in general speed of iteration should be…

408

Aparna Dhinakaran Retweeted

Andrej Karpathy@karpathy · Jul 6

Knowledge makes the world so much more beautiful.

441

1.0K

10.0K

672

705.0K

Aparna Dhinakaran@aparnadhinak · Jun 25

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window…

ttobi lutke@tobi · Jun 19

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

533

2.0K

14.0K

9.0K

2.3M