Aarush Sah
@AarushSah_
Head of Evals @GroqInc
I’m hiring an Evals Engineer to join us at @GroqInc! You’ll own foundational eval infrastructure, launch impactful open-source tooling, and shape how we ship new models. Your opportunity for scope and autonomy will be large - perfect for someone driven, eager to apply what they…
Developers prefer to build on open infrastructure, which is why we've seen MCP become so successful. The AI era will be an era where we return to the open internet and open-source generative technology.
Super cool eval results here, always excited to see this stuff from @kylejeong21 and co :)
Kimi K2 performs only slightly worse than grok 4 in terms of accuracy on @Stagehanddev, But has 7 times faster average inference speed. The battle of @GroqInc vs @grok
Powered by @GroqInc :)
Kimi just got WAY faster on T3 Chat. Supports tools and search as well. This might be my new favorite model.
Going to be talking to @RayFernando1337 about Kimi K2 on @GroqInc in 5 minutes :) youtube.com/live/5ahGxcBxJ…
We’ve deployed the tool call template update to @GroqInc. Looks good. Thanks @Kimi_Moonshot 🫡
We've just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix: - tokenizer_config.json: update chat-template so that it works for multi-turn tool calls. - tokenization_kimi.py: update encode method to enable encoding special…
Huge credit to @omarkilani for being a MONSTER engineer and pushing this to completion
Getting emotional seeing this (haven’t slept in 72 hours, YOLO launched a 1T param model) Huge credit to the incredible teams at @GroqInc for the relentless grind that made this possible.
excited to share my latest technical report @trychroma! we evaluated 18 LLMs, including state-of-the-art models, and observed model performance degradation with increasing input length
Introducing our latest technical report: Context Rot - How Increasing Input Tokens Impacts LLM Performance Our results reveal that models do not use their context uniformly. full report in replies
I’d pay good money for an AI sentinel that I can ask to watch for an email, and will send me a text if it flags a match. Has anybody built this? Good hackathon project if not, cc @LucasVogel_dev @thekrishdesai
If any of the great folks at Windsurf are job hunting, I’d love to talk. We’re looking for someone with deep experience working on evals, post-training and/or synthetic data generation to join the evals team at @GroqInc. We move fast and care deeply about our work - if you…