Aarush Sah

@AarushSah_

Head of Evals @GroqInc

Cupertino, CA

Joined September 2022

506Following

9KFollowers

Pinned

Aarush Sah@AarushSah_ · Jul 5

I’m hiring an Evals Engineer to join us at @GroqInc! You’ll own foundational eval infrastructure, launch impactful open-source tooling, and shape how we ship new models. Your opportunity for scope and autonomy will be large - perfect for someone driven, eager to apply what they…

130

24.0K

Aarush Sah Retweeted

Dylan Mitic@DylanMitic · 16 h

Developers prefer to build on open infrastructure, which is why we've seen MCP become so successful. The AI era will be an era where we return to the open internet and open-source generative technology.

585

Aarush Sah@AarushSah_ · Jul 19

I’m gonna make him an offer he can’t refuse

ssunny madra@sundeep · Jul 19

Groq 🤝 Humain

4.0K

Aarush Sah@AarushSah_ · Jul 19

I’m in SF who’s down to meet around 2?

3.0K

Aarush Sah@AarushSah_ · Jul 16

Super cool eval results here, always excited to see this stuff from @kylejeong21 and co :)

KKyle Jeong@kylejeong21 · Jul 16

Kimi K2 performs only slightly worse than grok 4 in terms of accuracy on @Stagehanddev, But has 7 times faster average inference speed. The battle of @GroqInc vs @grok

2.0K

Aarush Sah@AarushSah_ · Jul 16

TTheo - t3.gg@theo · Jul 15

Kimi just got WAY faster on T3 Chat. Supports tools and search as well. This might be my new favorite model.

293

12.0K

Aarush Sah@AarushSah_ · Jul 15

Going to be talking to @RayFernando1337 about Kimi K2 on @GroqInc in 5 minutes :) youtube.com/live/5ahGxcBxJ…

15.0K

Aarush Sah@AarushSah_ · Jul 15

We’ve deployed the tool call template update to @GroqInc. Looks good. Thanks @Kimi_Moonshot 🫡

KKimi.ai@Kimi_Moonshot · Jul 15

We've just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix: - tokenizer_config.json: update chat-template so that it works for multi-turn tool calls. - tokenization_kimi.py: update encode method to enable encoding special…

9.0K

Aarush Sah@AarushSah_ · Jul 15

Huge credit to @omarkilani for being a MONSTER engineer and pushing this to completion

OOmar Kilani@omarkilani · Jul 15

Getting emotional seeing this (haven’t slept in 72 hours, YOLO launched a 1T param model) Huge credit to the incredible teams at @GroqInc for the relentless grind that made this possible.

3.0K

Aarush Sah@AarushSah_ · Jul 15

Sure ✅

HHatice Ozen@ozenhati · Jul 13

anyone else want kimi on groq or

142

20.0K

Aarush Sah@AarushSah_ · Jul 14

excited to share my latest technical report @trychroma! we evaluated 18 LLMs, including state-of-the-art models, and observed model performance degradation with increasing input length

CChroma@trychroma · Jul 14

Introducing our latest technical report: Context Rot - How Increasing Input Tokens Impacts LLM Performance Our results reveal that models do not use their context uniformly. full report in replies

364

136

48.0K

Aarush Sah@AarushSah_ · Jul 14

I’d pay good money for an AI sentinel that I can ask to watch for an email, and will send me a text if it flags a match. Has anybody built this? Good hackathon project if not, cc @LucasVogel_dev @thekrishdesai

1.0K

Aarush Sah@AarushSah_ · Jul 13

Has anyone run Kimi-K2 on B200s yet?

21.0K

Aarush Sah@AarushSah_ · Jul 13

Okay B200s are pretty cool ngl

780

Aarush Sah@AarushSah_ · Jul 13

If any of the great folks at Windsurf are job hunting, I’d love to talk. We’re looking for someone with deep experience working on evals, post-training and/or synthetic data generation to join the evals team at @GroqInc. We move fast and care deeply about our work - if you…

2.0K