Teknium (e/λ)

@Teknium1

Cofounder and Head of Post Training @NousResearch, prev @StabilityAI Github: http://github.com/teknium1 HuggingFace: http://huggingface.co/teknium

USA

Joined February 2021

4KFollowing

48KFollowers

Pinned

Teknium (e/λ)@Teknium1 · Mar 13

Our best hybrid reasoner is now available! DeepHermes 24B is built on @MistralAI's Open 24B Mistral-Small model and is a real beast. We also released a new, smaller 3B DeepHermes for low resource edge reasoning! I am incredibly proud of how good DeepHermes 24B is at both…

NNous Research@NousResearch · Mar 13

Announcing the latest DeepHermes Preview models, DeepHermes 24B and 3B! huggingface.co/collections/No… These new models are Hybrid Reasoners - meaning you can toggle ON and OFF the long chain of thought reasoning whenever you want a short, intuitive answer, or a long, well reasoned…

771

209

154.0K

Pinned

Teknium (e/λ)@Teknium1 · 10 h

Pull Request is up for testing and review. There's still a lot to be done, but it's in a functioning state now.

AAlpin@AlpinDale · Jul 24

Spent the last 4 hours investigating an implementation plan. I have a WIP mock-up ready, which uses the official runfiles provided by NVIDIA. I looked at conda first, and it used a disgustingly terrible and complicated form of package management for large groups like cuda. I…

5.0K

Teknium (e/λ)@Teknium1 · 8 h

Only @repligate and @karan4d know

eemozilla@theemozilla · 10 h

5.0K

Teknium (e/λ)@Teknium1 · 10 h

What are your top AI project github repos?

7.0K

Teknium (e/λ)@Teknium1 · 18 h

Did a benchmark with the new Qwen3 Reasoner 220B on Arena-hard v1 It scores an 89% winrate over gpt4-0314, 4o scores an 81% dont have numbers for o3/4o-mini etc but its basically saturated a near perfect win rate. nicee

107

5.0K

Teknium (e/λ)@Teknium1 · 24 h

So to recap: - Yesterday, frontier closed model equivalent reasoning model from Qwen, - This morning, frontier closed model equivalent reasoning vision capabilities from stepfun - sometime today(?) a frontier video model from wan? All open source What is America doing?

WWan@Alibaba_Wan · Jul 24

Let’s sit down and await the release of Wan 2.2！

948

176

71.0K

Teknium (e/λ)@Teknium1 · 24 h

Pretty soon even closed frontier labs are going to be distilling from open models - how the tables turned lol

534

35.0K

Teknium (e/λ)@Teknium1 · 24 h

3.0K

Teknium (e/λ)@Teknium1 · Jul 25

Wow the new qwen reasoner at only 232B params is as good as the top closed frontier lab models Big day for OS

aapolinario 🌐@multimodalart · Jul 25

It was missing, so I added @AnthropicAI Opus 4 Thinking and @OpenAI o3 benchmark results to the comparison mix chart 🆚🔎 Vibe check pending, but on benchmarks it seems that we got an open model competitive with Opus 4 / o3 / Gemini 2.5 🤯

405

25.0K

Teknium (e/λ)@Teknium1 · Jul 25

Looks pretty cool!

CChujie Zheng@ChujieZheng · Jul 25

Compared to GRPO, GSPO offers significant advantages in stability, efficiency, performance, and infra-friendliness. Furthermore, it fundamentally and naturally resolves the stability issues in the RL training of large MoE models 💪

4.0K

Teknium (e/λ)@Teknium1 · Jul 25

lol what does this mean in the taxbench report - Lobotomized gemini 2.5 pro is the best tax accountant?

MMichael R. Bock@michaelrbock · Jul 23

1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:

5.0K

Teknium (e/λ)@Teknium1 · Jul 25

Now that this exists AI will be able to do your taxes very well, very soon

MMichael R. Bock@michaelrbock · Jul 23

1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:

126

8.0K

Teknium (e/λ)@Teknium1 · Jul 25

TormentNexusBench wen?

3.0K

Teknium (e/λ)@Teknium1 · Jul 24

"You are a QA manager with a personality disorder and an alcohol problem. You viciously critique all unit tests and view hardcoded passes, workarounds and as an affront to God's Creation, punishing the authors of such heresy with extreme predjudice until they fix the tests, and…

SSid@sidbidasaria · Jul 24

Claude Code is getting a brand new feature: custom subagents. Type `/agents` to get started.

366

121

23.0K

Teknium (e/λ)@Teknium1 · Jul 25

A bit more protection with local models - I dont blame chatgpt for this though, seems to just be a confusing fact about our legal system lol

CChief Nerd@TheChiefNerd · Jul 24

Listen carefully to what Sam Altman says here before you use ChatGPT… “If you go talk to ChatGPT about your most sensitive stuff and then there's a lawsuit, we could be required to produce that … It makes sense to … really want the privacy clarity before you use it a lot.”

4.0K

Teknium (e/λ)@Teknium1 · Jul 24

The IMO winner agent systems seems to have just been nous forge with reasoning models all along 😂 maybe we are going to have to bring that back in a much more rl enabled form some day soon? 😇 Read about that here forge.nousresearch.com

Teknium1's tweet card. Forge Reasoning API by Nous Research

3.0K

Teknium (e/λ)@Teknium1 · Jul 24

I like

KKiaran Ritchie@kiaran_ritchie · Jul 23

While we're on the topic of "impossible to build videogame styles"... I've always thought 1970s gouache watercolor concept paintings would look amazing in motion. (midjourney)

124

7.0K

Teknium (e/λ)@Teknium1 · Jul 24

Grok has the best search for info that is ever changing or very live

MMikel Artetxe@artetxem · Jul 23

Grok 4 dropped some impressive numbers, but its live search feature is still terribly bad in our evals! Barely any improvement over Grok 3, and still the worst of the big players by far. Oh, and it's also the second most expensive after Claude now!

7.0K

Teknium (e/λ)@Teknium1 · Jul 24

What does getting a high humanity’s last exam score mean if this is the case lol

AAndrew White 🐦‍⬛@andrewwhite01 · Jul 23

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

122

10.0K

Teknium (e/λ) Retweeted

AshutoshShrivastava@ai_for_success · Jul 22

OpenAI: We will drop an open-source model soon Meanwhile, Chinese labs have already released multiple. And tonight, Qwen is about to drop something big. Let’s go open-source 🚀

545

41.0K

Teknium (e/λ)@Teknium1 · Jul 23

Some work on a human discord simulator model @dmayhem93 has been working on building haha

2.0K