Hrishi
@hrishioa
Building artificially intelligent bridges at Southbridge, prev-CTO Greywing (YC W21). Chop wood carry water.
I'm sorry I have to leave early I have two Claude Codes at home
xkcd predicted the subagent interaction bike shedding problem @hrishioa "our main problem is our unclear definition of value" - even the verification part!
this is awesome - lots of new patterns
Claude Code is All You Need When I first joined Anthropic I was surprised to learn that lots of the team used Claude Code as a general agent, not just for code. I’ve since become a convert! I use Claude Code to help me with almost all the work I do now, here’s how:
Benefit #17 of being in Singapore: You get practically unthrottled Claude at blazing speeds while the sun is still up Problem #17 of being in Singapore: You get amazing fast models while it's a beautiful day outside - I don't know personally I haven't checked
I will be so glad to be rid of the messages weirdness (Probably @minune29 too)
useAssistant A bunch of annoying little methods Message.content (now purely based on Message.parts)
Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:
HrishiBench is all you need (not even kidding): LiveCodeBench v6 (Pass@1) Kimi K2: 53.7, DeepSeek-V3-0324: 46.9, etc <- how about real examples? HrishiBench "genuinely impressive, beats Grok 4, doesn't seem to use CoT or thinking tokens" -> next tweet has examples.
Kimi K2 is genuinely impressive. On the same tasks and the same agentic harness, one on one beats Grok 4. Also does it without CoT or thinking tokens looks like. github.com/MoonshotAI/Kim…
Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with: - no labeled data - no hand-crafted reward functions - no human feedback! A 🧵 on RULER
Things are going to change - faster than we think, with massive downstream effects. blog.cloudflare.com/introducing-pa…
Don't built MCP servers. Build CLI tools with a --llm flag the LLM can invoke to get an LLM compatible description of what the tool does and how to use it. Benefit: you don't have a gazillion MCP server tools in your context. You pull in just the tools you need ad-hoc.