Jay

@jayendra_ram

using computers @hud_evals, prev cs+physics @columbia, @ycombinator

SF, NYC

Joined September 2022

827Following

2KFollowers

Pinned

Jay@jayendra_ram · Jan 24

I've been working with a small team to evaluate agentic models for computer use agents. Today, we're thrilled to introduce Autonomy, our comprehensive eval for AI agents. We aim to create an eval that's rigorous, tests agency, and moves toward general intelligence. 1/ 🧵

jayendra_ram's tweet image. I've been working with a small team to evaluate agentic models for computer use agents.

Today, we're thrilled to introduce Autonomy, our comprehensive eval for AI agents. We aim to create an eval that's rigorous, tests agency, and moves toward general intelligence. 1/ 🧵

462

391

110.0K

Jay Retweeted

Sam Altman@sama · Jul 17

watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.

1.0K

860

13.0K

1.0K

4.1M

Jay@jayendra_ram · Jul 10

If you're in NYC you should 100% go to this. x.com/aaron_epstein/…

AAaron Epstein@aaron_epstein · Jul 10

Looking forward to meeting builders in NYC next week. RSVP for the event and reply below with what you’re working on.

577

Jay@jayendra_ram · Jul 9

Everyone wants to make an eval but no one wants to write some heavy ass tasks

645

Jay@jayendra_ram · Jul 7

Is there a good iOS app that I can use to interact with MCP servers? Would pay for this.

472

Jay@jayendra_ram · Jul 2

Soham should be let into the next @ycombinator batch. He has a great answer to the "time I hacked a non-computer system" question. Plus he's already worked at a few YC companies already. x.com/garrytan/statu…

GGarry Tan@garrytan · Jul 2

Without the YC community this guy would still be operating and would have maybe never been caught The startup guild of YC is a necessary invention to help founders be more successful than they would be alone

15.0K

Jay Retweeted

Minh Nhat Nguyen@menhguin · Jun 19

Good news: Some places let you iterate just as fast! Currently at @hud_evals we are using the same eval stack as what many frontier labs use for their AI agents! We're hiring part-time/intern eval makers, and fulltime devs across p much the entire stack :) DM if interested.

1.0K

Jay@jayendra_ram · May 29

Proud to announce that my anthropic API key has hit tier 3!

3.0K

Jay Retweeted

Theo - t3.gg@theo · May 23

I’ll cover this more in my video, but tl;dr: - Anthropic tests the ways that the model will try to “disobey” because safety (everyone does this) - they came up with a compelling test, giving the model a fake set of tools + a fake scenario that would affect public health - they…

643

86.0K

Jay@jayendra_ram · May 23

we switched out the underlying Operator model to o3 today. we think it's a step jump improvement from the previous 4o-based model. give it a try with some of your old prompts that failed! it was fun to work on this, the o-series paradigm makes everything so much better!

OOpenAI@OpenAI · May 23

Operator 🤝 OpenAI o3 Operator in ChatGPT has been updated with our latest reasoning model. operator.chatgpt.com

1.0K

242

1.2M

Jay Retweeted

Y Combinator@ycombinator · May 21

Jon Xu and Andrew Miklas both went through YC in 2010. Fifteen years and two iconic companies later— FutureAdvisor and PagerDuty— they're back, this time as YC's newest General Partners. Welcome, @jonxu and Andrew! ycombinator.com/blog/welcome-j…

264

207.0K

Jay Retweeted

Jason Wei@_jasonwei · May 18

Discriminator-generator gap seems to be the most important idea in AI for scientific innovation. With compute + clever search, anything that we can measure will be optimized. First up will be environments that can be verified quickly, with continuous reward, and at scale.…

624

379

93.0K

Jay@jayendra_ram · May 17

hud ❤️ mangoes

AAndrew@AndrewYatzkan · May 17

PULL UP TO MANGO TANGO @ DOLORES!!

1.0K