Aman Arora
@amaarora
Building AI Agents 🤖 | Blog: http://amaarora.github.io | Previously: @weights_biases; @Harrison.ai
When we're able to delegate something, to have some of our work done by an automation process or someone else, we automatically *feel* more productive. "Well, that was easy! At the very least it saved me a bunch of typing!" But the relationship between task delegation and…
Is it malpractice to report SOTA with pass@8 without using other models at pass@8 or just standard practice at this point? It's clearly not SOTA if it's behind Devstral in a pass@1
Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…
Claude Code is getting a brand new feature: custom subagents. Type `/agents` to get started.
The Toad is out of the bag! 🛍🐸 Announcing Toad - a universal UI for agentic coding in the terminal willmcgugan.github.io/announcing-toa…
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!
Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3
Google Search is the largest AI product in the world
Been refining my ML-project workflow and loving every minute 🙂 4 tools I'm using in unexpected ways:
Ep 1 🙂 x.com/radekosmulski/…
As it turns out, I’m quite GPU poor in my life outside of work. But I love using @LambdaAPI to the point where I created this tiny CLI tool to make the experience even better 🙂
🆕 Building voice agents with OpenAI @DKundel launches the @openai Agents SDK in TypeScript -AND- gives the ultimate state of the art in best practices building voice agents with the latest S2S models! youtube.com/watch?v=iXhba3…
Gemini 2.5 Pro is back in the free tier of the API, have a great weekend : )
Since it's summer, and more or less internship and tech interview season, I made all 30 chapters of my Machine Learning Q and AI book freely available for the summer: sebastianraschka.com/books/ml-q-and… Hope it’s helpful! Happy reading, and good luck if you are interviewing!
f you're interested in the nascent field of context engineering (the new, more cromulent alternative to prompt engineering) Drew's piece here provides some excellent nomenclature for some of the challenges you might face...
As your context bloats, you hit different failure modes. These failures hit agents hardest because they operate in exactly the scenarios where contexts balloon: gathering information, making sequential tool calls, engaging in multi-turn reasoning, & accumulating histories.
You can now add @huggingface to @cursor_ai to find models, datasets, papers, apps,... Vibe coding a website is cool but imagine if the new AI powered code editors would turn everyone into an AI builder able to train AI themselves? How cool would that be?
Key to research success: ambition in vision, but pragmatism in execution. You must be guided by a long-term, ambitious goal that addresses a fundamental problem, rather than chasing incremental gains on established benchmarks. Yet, your progress should be grounded by tractable…