Shreya Shankar

@sh_reya

doing a PhD @Berkeley_EECS, building http://docetl.org | teaching http://bit.ly/evals-ai | formerly ML eng & undergrad @Stanford CS

Berkeley, CA

Joined January 2014

668Following

47KFollowers

Pinned

Shreya Shankar@sh_reya · Sep 24

LLMs have made exciting progress on hard tasks! But they still struggle to analyze complex, unstructured documents (including today's Gemini 1.5 Pro 002). We (UC Berkeley) built 📜DocETL, an open-source, low-code system for LLM-powered data processing: data-people-group.github.io/blogs/2024/09/…

sh_reya's tweet image. LLMs have made exciting progress on hard tasks! But they still struggle to analyze complex, unstructured documents (including today's Gemini 1.5 Pro 002).

We (UC Berkeley) built 📜DocETL, an open-source, low-code system for LLM-powered data processing: data-people-group.github.io/blogs/2024/09/…

255

2.0K

306.0K

Shreya Shankar Retweeted

Hamel Husain@HamelHusain · 9 h

If you want ONE place to keep up with AI coding agents, you should pay attention to what @isaac_flath and @intellectronica are putting together: bit.ly/coding-ai. I've worked with both, and they have phenomenal taste. Isaac: 7+ years working on dev tools in both open…

4.0K

Shreya Shankar Retweeted

Hamel Husain@HamelHusain · Jul 25

To what extent can you automate or delegate evals? Is there a way to make it "not your problem?" 😅 Part 1 of 1: You should absolutely automate parts of it as long as a human is in the loop. Many people are a bit too aggressive here, so you have to be careful, some guidelines…

5.0K

Shreya Shankar Retweeted

Sam Whitmore@sjwhitmore · Jul 24

the biggest bottleneck in my workflow is communicating my intentions clearly and no amount of model intelligence increase will solve that for me unless we are truly symbiotic which i don't really want anyway. or i can cede my agency to its choices which i don't want either

130

10.0K

Shreya Shankar Retweeted

Hamel Husain@HamelHusain · Jul 24

We’ve extended enrollment in our **last** live cohort on AI Evals until the end of this week! Here’s the syllabus (2 lessons per week): Week 1: Fundamentals & Lifecycle LLM Application Evaluation, Systematic Error Analysis Week 2: Implementing Effective Evaluations,…

4.0K

Shreya Shankar@sh_reya · Jul 22

"failure mode taxonomy" is a good abstraction

ZZach Mueller@TheZachMueller · Jun 18

Stop wasting time guessing why your AI fails. The most valuable skill I learned recently: error analysis maven.com/parlance-labs/… Hamel & Shreya teach you how to diagnose what's going wrong with your pipeline, and build evals you can trust at scale. Error analysis is just the…

6.0K

Shreya Shankar@sh_reya · Jul 23

Really like this set of standout ideas. We say a million things in the course reader and I love hearing what sticks / what's practical

vvishal@vishal_learner · Jul 23

Just published a blog post where I highlight 10 ideas that stood out to me from the first lesson and first three chapters of the course reader from the AI evals course taught by @HamelHusain and @sh_reya. vishalbakshi.github.io/blog/posts/202…

3.0K

Shreya Shankar Retweeted

Alex Strick van Linschoten@strickvl · Jul 22

Today we restarted the @HamelHusain / @sh_reya evals course and to accompany this first week's class I'm publishing the first part of a series of annotated posts to accompany the course textbook. (Link in the 🧵) The aim was to give more examples from the @zenml_io LLMOps…

3.0K

Shreya Shankar@sh_reya · Jul 22

This is a great post by @sanjanayed and aligns well with what @HamelHusain and @sh_reya pitch in their evals course as well. You don't want to outsource your annotations. It makes a lot of sense to use tools that let you build your own annotation tools (using @v0, @lovable_dev…

ssanjana@sanjanayed · Jul 22

Just wrapped up a tutorial - I use a custom annotations tool to build an end-to-end evaluation & experimentation pipeline🚀 Inspired by an article from @eugeneyan, I explore how to leverage annotations to construct evals, design thoughtful experiments, and systematically improve…

4.0K

Shreya Shankar@sh_reya · Jul 21

ngl I'm most excited about this cage match between Eval vendors. They are going to solve the homework assignments, side-by-side. @hwchase17 (Langsmith) vs @mikeldking (Phoenix) vs @waydegilliam (Braintrust) maven.com/parlance-labs/…

SShreya Shankar@sh_reya · Jul 21

Excited to kick off a much improved version of our AI evals course tomorrow (link in replies). 💫 We've added dedicated homework sessions, an updated course reader & lectures that incorporates 100s of questions from cohort 1. There’s more hands-on/live error analysis, plus…

7.0K

Shreya Shankar Retweeted

Hugo Bowne-Anderson@hugobowne · Jul 20

They tell you 2025 is the year of AI agents, and yes, that’s true in many ways. But it’s also becoming the year of evaluation. We’ve got startling models and tooling, but now we’re asking what’s working, what’s not, and how do we measure it? I recently took @HamelHusain and…

6.0K

Shreya Shankar Retweeted

Paweł Huryn@PawelHuryn · Jul 18

Best AI teams obsess over measurement and iteration. If you streamline your AI evals, all other activities become easy. But we can't simply take CI/CD from traditional software or ML. Why?

146

273

14.0K

Shreya Shankar Retweeted

Aakash Gupta@aakashg0 · Jul 19

"Vibe checks" are great—until you need to scale. In this clip, @HamelHusain and @sh_reya break down why relying on human intuition isn’t enough when it comes to evaluating product or model quality at scale. Instead, they explain how to codify those gut checks into scalable,…

6.0K

Shreya Shankar Retweeted

MIT CSAIL@MIT_CSAIL · Jul 18

A step-by-step guide to diffusion models: bit.ly/4kw0uKo v/@goyal__pramod

127

720

867

52.0K

Shreya Shankar Retweeted

Hamel Husain@HamelHusain · Jul 17

The eval space is the most intense battle for AI market share I have seen second to coding agents. This is why we will have Arize & Braintrust go head-to-head. They will each show how to complete our 5 homework assignments using their tools . Over 1k students learning about…

113

26.0K

Shreya Shankar@sh_reya · Jul 17

If you like the FAQ, it pales in comparison to the textbook she wrote (not kidding) humbly named “course notes” 😅 Yes, we will we release a book at some point but the best way to learn evals is interactive practice, examples, and different perspectives. We bring that all…

SShreya Shankar@sh_reya · Jul 17

just gonna leave this here if anyone is wondering how it feels to write the curriculum for a course on a hot topic

4.0K

Shreya Shankar Retweeted

Alex Strick van Linschoten@strickvl · Jul 17

Just published summaries + a brief analysis of 287 LLMOps case studies from the past few months over on the @zenml_io blog. Some observations about what's actually happening in production AI: - Agents are real now, but not what we expected Most successful production agents are…

2.0K