Christopher
@communicating
Optimist, Geek, Building @AgletsAI. DotConnector, ToolBuilder, InfoHacker & Coder. Into Agents, Graphs, LLMs especially SLMs, NLProc & making hard things easier
Hoping the awesome team at @basetenco add Kimi K2 soon! Pretty Please. 🙏 😊
Claude Code is the proving ground for what Anthropic is really planning. This much became clear today.
Rust-based Zed Industries partnered with Baseten to achieve 2x faster AI code completions through custom optimization. thenewstack.io/how-rust-based…
The coming week will be tough but exciting and hopefully big! 🔥
Am I the only one that wishes he had full control of the K/V Cache in Claude Code? I’ve been doing spec driven development 4 a while (actually from the start of using these tools) & I want more (complete pls) granular control. The Clear command is a hammer when a chisel is needed
One of the big things holding coding agents back from taking another step forward is the use of too much irrelevant context & jumbo tasks. This is going to sound obvious but context size on its own is not enough. Instead look to tight, highly specific task by task context blocks
A highly accurate (benchmarked) SLM for evaluating tool use issues.
new model suite just dropped. 𝗹𝗶𝗺𝗯𝗶𝗰-𝘁𝗼𝗼𝗹-𝘂𝘀𝗲-𝟬.𝟱𝗕 → 𝟴𝟴.𝟲% accuracy 𝗹𝗶𝗺𝗯𝗶𝗰-𝘁𝗼𝗼𝗹-𝘂𝘀𝗲-𝟯𝗕 → 𝟵𝟰.𝟲% accuracy 𝗹𝗶𝗺𝗯𝗶𝗰-𝘁𝗼𝗼𝗹-𝘂𝘀𝗲-𝟳𝗕 → 𝟵𝟲.𝟮% accuracy outperforms gpt-4.1 (74.0%) and claude-sonnet-4 (71.1%) on tool use evaluation.
Beast Mode 3.1 from Burke Holland is out with further improvements. If you’re stuck having to use GPT 4.1 in VS.Code instead of Opus, Sonnet, 03 or another more capable coding model then this did improve the results in a lot of tests where I tried it. burkeholland.github.io/posts/beast-mo…
Nice.
@gradio is now pre-installed in @GoogleColab! It’s easier than ever to include demos and visualizations in your notebooks.
You have to love when the gourmet chef publishes the recipe. Thanks @NovaSkyAI! SkyRL + SearchR1: A Guide for Multi-Turn Tool-Use RL for Search novasky-ai.notion.site/skyrl-searchr1
While I’m stoked by this release I’m a little annoyed the benchmarks they’re using do not include Opus 4. If you want to give us a complete picture please show us the numbers against the best coding model (Opus) not the 2nd best model (Sonnet). I just prefer the full picture.
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Interesting use of VibeTunnel… It’s such an interesting project. I’ll be giving it a run through and see if it can fit into & maybe improve my workflow.
Even when you don’t use the remote feature much, here’s some tricks you wanna have! steipete.me/posts/command-…
Interesting Utility… I take my laptop everywhere so I don’t really have a huge need for this but it caught my eye. I suppose if I wanted to code from my phone or tablet this would be a way to do it… Vibetunnel: Your Mac Terminal in Any Browser vibetunnel.sh
Awesome! Thank You Moonshot! 👍🔥
CONFIRMED: Kimi K2’s “modified-MIT” license does NOT apply to synthetic data or models trained on synthetic data. “Text data generated by the model is NOT considered as a derivative work.” Hopefully this will lead to more open source agentic models!
Another awesome release from Nous Research. 🔥 With all the dataset drops lately we’re getting a pretty extensive collection of clean, highly relevant data for model post-training. The value of these open source datasets can’t be overstated. Exciting times.
huggingface.co/datasets/NousR…
Just did a cartwheel. Baseten just added Kimi K2 as an endpoint and if it’s on Baseten it’ll be fast. There’s a reason (well many) @basetenco is my inference (etc) partner of choice!
Confession. Kimi K2 is one of our new favorite models for agentic use cases. Baseten is powering the fastest Kimi K2 available on Openrouter. Test it through our Model APIs today. Also…say Kimi K2 10x fast. Thanks, @Madisonkanna.
While this isn’t my area of interest this paper describes a useful dataset for Embodied Reasoning w/ VLMs. It includes benchmark results & shows improvement by fine tuning w/ the dataset EmbRACE-3K: Embodied Reasoning and Action in Complex Environments huggingface.co/papers/2507.10…
Now with full tool use support
Since @GroqInc added support for Kimi K2 at 185 tps, created a simple proxy server to use it with Claude Code. The server lets you use Kimi K2 within Claude Code on a free Groq account as a replacement for Claude
This is a really interesting paper. The benchmarks are mostly math oriented so I wonder how it will generalize to other domains - if it does it could have lots of applications. Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
discuss with author: huggingface.co/papers/2507.05…