Hamel Husain

@HamelHusain

Evals evals evals http://bit.ly/evals-ai About Me: https://hamel.dev

Looking at the data

Joined September 2012

2KFollowing

37KFollowers

Hamel Husain Retweeted

Greg Ceccarelli@gregce10 · 5 h

I built an API-first link shortener for agents. Because I was tired of paying @Bitly $40/month for 3 damn links and spend all my time in @claude_code. Meet tny.dev 🎥 Video demo below 👇

3.0K

Hamel Husain@HamelHusain · 11 h

evals are all you need

TTanishq Abraham is at ICML@iScienceLuvr · 23 h

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

125

9.0K

Hamel Husain@HamelHusain · 19 h

I was also pleasantly surprised the other day. The only fustrating thing is that I can't really put my finger on what makes it better. It seems faster, but there's something else too ...

HHamel Husain@HamelHusain · 21 h

Ya’ll @AmpCode is good. Try it. Thank me later. It’s expensive but not as expensive as your time.

3.0K

Hamel Husain@HamelHusain · 21 h

Ya’ll @AmpCode is good. Try it. Thank me later. It’s expensive but not as expensive as your time.

158

22.0K

Hamel Husain@HamelHusain · Jul 23

in case you are wondering this is academia now

hhardmaru@hardmaru · Jul 23

ICML’s Statement about subversive hidden LLM prompts We live in a weird timeline…

725

4.0K

642

801.0K

Hamel Husain Retweeted

Isaac Flath@isaac_flath · Jul 23

Most people don't realize how good OSS tools for AI coding are. Join @intellectronica and me as she summarizes the open-source landscape and highlights the top ones you should incorporate into your workflow. maven.com/p/2cb739/oss-i…

2.0K

Hamel Husain Retweeted

vishal@vishal_learner · Jul 23

Just published a blog post where I highlight 10 ideas that stood out to me from the first lesson and first three chapters of the course reader from the AI evals course taught by @HamelHusain and @sh_reya. vishalbakshi.github.io/blog/posts/202…

6.0K

Hamel Husain Retweeted

Charles 🎉 Frye@charles_irl · Jul 22

My talk for @aiDotEngineer on what I think every person working with language models needs to know about GPUs is now available! - Latency lags bandwidth. - GPUs embrace bandwidth. - Don't be scared of N squared. - Use the Tensor Cores, Luke! youtube.com/watch?v=y-UGrY…

361

263

39.0K

Hamel Husain Retweeted

Alex Strick van Linschoten@strickvl · Jul 22

Today we restarted the @HamelHusain / @sh_reya evals course and to accompany this first week's class I'm publishing the first part of a series of annotated posts to accompany the course textbook. (Link in the 🧵) The aim was to give more examples from the @zenml_io LLMOps…

3.0K

Hamel Husain Retweeted

Isaac Flath@isaac_flath · Jul 22

IMO the hardest part of web dev is getting social cards to work properly

2.0K

Hamel Husain Retweeted

Isaac Flath@isaac_flath · Jul 22

I'm launching Context Engineering For Coding to make AI assisted coding more efficient, and it works with Cursor, Claude Code, Copilot, all of them. Here's a 30% off discount link for early enrollers maven.com/kentro/context…

5.0K

Hamel Husain@HamelHusain · Jul 22

This is a great post by @sanjanayed and aligns well with what @HamelHusain and @sh_reya pitch in their evals course as well. You don't want to outsource your annotations. It makes a lot of sense to use tools that let you build your own annotation tools (using @v0, @lovable_dev…

ssanjana@sanjanayed · Jul 22

Just wrapped up a tutorial - I use a custom annotations tool to build an end-to-end evaluation & experimentation pipeline🚀 Inspired by an article from @eugeneyan, I explore how to leverage annotations to construct evals, design thoughtful experiments, and systematically improve…

4.0K

Hamel Husain@HamelHusain · Jul 21

ngl I'm most excited about this cage match between Eval vendors. They are going to solve the homework assignments, side-by-side. @hwchase17 (Langsmith) vs @mikeldking (Phoenix) vs @waydegilliam (Braintrust) maven.com/parlance-labs/…

SShreya Shankar@sh_reya · Jul 21

Excited to kick off a much improved version of our AI evals course tomorrow (link in replies). 💫 We've added dedicated homework sessions, an updated course reader & lectures that incorporates 100s of questions from cohort 1. There’s more hands-on/live error analysis, plus…

7.0K

Hamel Husain@HamelHusain · Jul 22

Fairly convincing phishing attempt ... watch out folks don't fall for this (email was from [email protected])

HamelHusain's tweet image. Fairly convincing phishing attempt ... watch out folks don't fall for this

(email was from x-dev4415@social.mg.gov.br)

3.0K

Hamel Husain Retweeted

Eleanor Berger@intellectronica · May 25

🎯 Benchmarks vs. Evals: How I learned to tell the difference by remembering my dating days Picture this: Your friends set you up on a blind date. 💑 They tell you everything: • Tall ✓ • Deep blue eyes ✓ • Shiny brown hair ✓ • Economics PhD ✓ • Volleyball enthusiast ✓…

3.0K

Hamel Husain Retweeted

Shreya Shankar@sh_reya · Jul 21

12.0K

Hamel Husain Retweeted

Hugo Bowne-Anderson@hugobowne · Jul 20

They tell you 2025 is the year of AI agents, and yes, that’s true in many ways. But it’s also becoming the year of evaluation. We’ve got startling models and tooling, but now we’re asking what’s working, what’s not, and how do we measure it? I recently took @HamelHusain and…

6.0K