Hrishi

@hrishioa

Building artificially intelligent bridges at Southbridge, prev-CTO Greywing (YC W21). Chop wood carry water.

Long form thoughts 🫱

Joined June 2013

2KFollowing

11KFollowers

Pinned

Hrishi@hrishioa · Jun 20

I'm sorry I have to leave early I have two Claude Codes at home

304

19.0K

Hrishi@hrishioa · Jul 19

Mixture of four idiots is perfect

ggerred@devgerred · Jul 19

"The user wasn't happy with the initial response" no shit, I got the mixture of 4 idiots on that one.

1.0K

Hrishi Retweeted

Eugene Yaroslavtsev@exhaze · Jul 19

xkcd predicted the subagent interaction bike shedding problem @hrishioa "our main problem is our unclear definition of value" - even the verification part!

1.0K

Hrishi@hrishioa · Jul 16

I see the secrets are out

1.0K

Hrishi@hrishioa · Jul 15

this is awesome - lots of new patterns

TThariq@trq212 · Jul 14

Claude Code is All You Need When I first joined Anthropic I was surprised to learn that lots of the team used Claude Code as a general agent, not just for code. I’ve since become a convert! I use Claude Code to help me with almost all the work I do now, here’s how:

1.0K

Hrishi@hrishioa · Jul 14

Benefit #17 of being in Singapore: You get practically unthrottled Claude at blazing speeds while the sun is still up Problem #17 of being in Singapore: You get amazing fast models while it's a beautiful day outside - I don't know personally I haven't checked

2.0K

Hrishi@hrishioa · Jul 14

I will be so glad to be rid of the messages weirdness (Probably @minune29 too)

MMatt Pocock@mattpocockuk · Jul 7

useAssistant A bunch of annoying little methods Message.content (now purely based on Message.parts)

2.0K

Hrishi Retweeted

Sebastian Raschka@rasbt · Jul 12

Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:

530

5.0K

3.0K

533.0K

Hrishi@hrishioa · Jul 12

HrishiBench is all you need (not even kidding): LiveCodeBench v6 (Pass@1) Kimi K2: 53.7, DeepSeek-V3-0324: 46.9, etc <- how about real examples? HrishiBench "genuinely impressive, beats Grok 4, doesn't seem to use CoT or thinking tokens" -> next tweet has examples.

HHrishi@hrishioa · Jul 12

Kimi K2 is genuinely impressive. On the same tasks and the same agentic harness, one on one beats Grok 4. Also does it without CoT or thinking tokens looks like. github.com/MoonshotAI/Kim…

3.0K

Hrishi Retweeted

Kyle Corbitt@corbtt · Jul 11

Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with: - no labeled data - no hand-crafted reward functions - no human feedback! A 🧵 on RULER

124

1.0K

2.0K

172.0K

Hrishi Retweeted

Justin Lee@Justin01805921 · Jul 8

We timelapsed the whole of YC.

102

2.0K

521

135.0K

Hrishi@hrishioa · Jul 8

1.0K

Hrishi@hrishioa · Jul 8

I too like to live dangerously

1.0K

Hrishi@hrishioa · Jul 3

Things are going to change - faster than we think, with massive downstream effects. blog.cloudflare.com/introducing-pa…

hrishioa's tweet card. Pay per crawl is a new feature to allow content creators to charge AI crawlers for access to their content.

1.0K

Hrishi Retweeted

Mario Zechner@badlogicgames · Jul 2

Don't built MCP servers. Build CLI tools with a --llm flag the LLM can invoke to get an LLM compatible description of what the tool does and how to use it. Benefit: you don't have a gazillion MCP server tools in your context. You pull in just the tools you need ad-hoc.

1.0K

973

144.0K