Rayan Krishnan

@RayanKrishnan

ceo @_valsai | solve evals, solve intelligence prev @stanford @PalantirTech

Joined April 2019

226Following

255Followers

Pinned

Rayan Krishnan@RayanKrishnan · Jul 13

Hasn't changed much since Grok 2 x.com/RayanKrishnan/…

VVals AI@_valsai · Jul 13

In the livestream, Elon Musk called Grok 4 “partially blind”. We tested this claim on our two multimodal benchmarks (Mortgage Tax and MMMU) and found a bigger gap between public (pink) and private (purple) benchmarks.

122

Rayan Krishnan@RayanKrishnan · Jul 22

Chinese open source developers have now far outpaced their western counterparts. Of course OAI's open weight model is coming any day now, right?

VVals AI@_valsai · Jul 22

Does @Kimi_Moonshot's Kimi K2 live up to the hype? We found that it is indeed the new state-of-the-art open-source model according to our evaluations. The model cracks the top 10 on Math500 and LiveCodeBench, narrowly beating out DeepSeek R1 on both. (1/4)

118

Rayan Krishnan@RayanKrishnan · Jul 18

SF now pioneering AI stop signs!

166

Rayan Krishnan@RayanKrishnan · Jul 15

Finally a leaderboard that shows you which LLM is the best gambler. Now give Claude your banking info :)

VVals AI@_valsai · Jul 15

We evaluated @AnthropicAI and @OpenAI models on our Finance Agent Benchmark, compiling results from the best each lab had to offer across question categories. Both labs are pushing the boundaries on financial agentic capabilities. Financial institutions are increasingly relying…

2.0K

Rayan Krishnan Retweeted

Vals AI@_valsai · Jul 11

@grok 4 struggles on our private benchmarks, in contrast to SOTA performance on AIME, Math 500, and GPQA… We had high hopes after Wednesday’s livestream 😔 (🧵1/3)

1.0K

Rayan Krishnan@RayanKrishnan · Jul 10

Very capable model based on our initial testing. Remains to be seen how it does on our held-out sets

VVals AI@_valsai · Jul 10

Grok 4 is the new state-of-the-art on our academic math and science benchmarks (AIME, GPQA, MATH 500) 🚀 Congrats @xai @elonmusk @Yuhu_ai_ @belce_dogru

114

Rayan Krishnan Retweeted

Eric Zelikman@ericzelikman · Jul 10

142

3.0K

185

171.0K

Rayan Krishnan@RayanKrishnan · Jul 10

wen grok 5 solve millennium prize problem tho? @ericzelikman

122

Rayan Krishnan@RayanKrishnan · Jul 8

Batch API is a win-win-win (providers, builders, users) and I'm glad more providers are reaching the scale to enable it. We worked with the Google team to beta their Gemini batch API in our evaluations for 2.5! Well done @divy93t @OfficialLoganK

VVals AI@_valsai · Jul 8

Grateful to the @GeminiApp team for the shoutout on our Batch API integration! We’ve added batching support on our platform as part of our ongoing efforts to improve cost efficiency for running increasingly large benchmarks (along with similar offerings from OpenAI, Anthropic,…

252

Rayan Krishnan@RayanKrishnan · Jul 2

Well that was close

SSuhail@Suhail · Jul 2

PSA: there’s a guy named Soham Parekh (in India) who works at 3-4 startups at the same time. He’s been preying on YC companies and more. Beware. I fired this guy in his first week and told him to stop lying / scamming people. He hasn’t stopped a year later. No more excuses.

814

Rayan Krishnan Retweeted

Vals AI@_valsai · Jul 1

Another Vals AI Game Night in the books! Thanks to everyone that came out last Friday for Za's Pizza and board games! We always appreciate seeing friends and making new ones. Interested in coming to the next one? DM us!

353

Rayan Krishnan@RayanKrishnan · Jul 1

The duality of SF

112

Rayan Krishnan@RayanKrishnan · Jun 30

Hows OAI supposed to reduce churn if its "leaders" are so obviously not using their own product to write memos?

155

Rayan Krishnan@RayanKrishnan · Jun 30

Meta AI is nothing without its people??

KKylie Robison@kyliebytes · Jun 30

BREAKING: Mark Zuckerberg notified Meta staff today to introduce them to the new superintelligence team. The memo, which WIRED obtained, lists names and bios for the recently hired employees, many of whom came from rival AI firms like OpenAI, Anthropic, and Google.

171