Xeophon
@xeophon_
AI, LLMs
A recurrent depth/Huginn-3.5B Update: I orginally wanted to post these more often, but I guess time is a river, and I just don't like posting all that much yet... The most interesting finding about the depth recurrent model has been this unassuming chart, actually:
There really is a renaissance of those models lately
GLiClass-V3: A family of encoder-only models that match or exceed DeBERTa-v3-Large in zero-shot accuracy, while delivering up to 50× faster inference. Core Design: - Single-pass inference: No cross-encoder pairing needed. One forward pass handles all labels. - LoRA adapters:…
Me when switching from personal to corpo environment:
The gaps between Claude Code over Cursor Agents over Github Copilot for basic scripting, while using the same underlying model, is bonkers. Copilot barely works. Cursor is okay but frustrating (and slower). Claude Code usually just works fast.
You are laughing? There are multiple open models threatening Sonnet and you are laughing?

the whale is such a perfectionist that it worked hard to get the perfect elo
Three weeks ago, we started building an AI game engine. But some models kept making things look... sloppy. So we turned finding the best one into a game. In three weeks, that game grew to 35K+ users across 135 countries. Introducing @designarena_ai, the fastest-growing…
Anthropic has been a series of ideological decisions later defeated by business realities
SCOOP: Leaked memo from Anthropic CEO Dario Amodei outlines the startup's plans to seek investment from the United Arab Emirates and Qatar. “Unfortunately, I think ‘no bad person should ever benefit from our success’ is a pretty difficult principle to run a business on.”
Kudos to Mistral for being this transparent! Would love to see other labs follow, finally putting a rest to all those „but a single message in ChatGPT slurps down the entire ocean“ arguments
Environmental impacts of @MistralAI LLMs We have conducted a first-of-its-kind comprehensive study to quantify the environmental impacts of our LLMs. With this study, we are not only addressing our own impacts but also hope to contribute to a global environmental standard for…
Qwen3-235B-2507 has an unusual degradation mode: in longform writing, it devolves into super short one-sentence paragraphs. It's coherent & not repeating. Wouldn't even be bad writing as a one-off stylistic choice, except, it does this for every piece and clearly is unintended.
It is an interesting paper, worth a read - Upcycled from Qwen2.5 32B - Heavy synthetic data gen - **Used internally** by the team (so.hopefully not a benchmaxxer) The maj of the paper is them teaching the model to learn when to use thinking vs. non-thinking mode, they succeeded
🚀 Excited to introduce KAT-V1 (Kwaipilot-AutoThink) – a breakthrough 40B large language model from the Kwaipilot team! KAT-V1 dynamically switches between reasoning and non-reasoning modes to address the “overthinking” problem in complex reasoning tasks. Key Highlights: 📌 40B…
🚀 Excited to introduce KAT-V1 (Kwaipilot-AutoThink) – a breakthrough 40B large language model from the Kwaipilot team! KAT-V1 dynamically switches between reasoning and non-reasoning modes to address the “overthinking” problem in complex reasoning tasks. Key Highlights: 📌 40B…
Fascinating! It seems base model with an inference time harness get gold already without deepthink.
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025