Charles Goddard
@chargoddard
Chief of Frontier Research @arcee_ai MergeKit author Github: https://github.com/cg123
(trying to convince my friends to hang out at wells fargo and drink the free coffee instead of going to bars to save money) it's popping at the farg tonight!
Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run. Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks. Contribute towards AGI via open, permissionless compute.
Want to read me rant about context extension experiments for our foundation models? You’re in luck
Last week, we launched AFM-4.5B, our first foundation model. In this post by @chargoddard , you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through aggressive experimentation, model merging, distillation, and a concerning amount of soup. Bon…
📢 Big News! We're thrilled to announce the Arcee Foundation Model (AFM) Family, starting with AFM-4.5B - our *first* foundational model! 🚀 ⚙️ Built for real-world performance — GPU-tier results, CPU-efficient 📜 Enterprise-ready — privacy, compliance & Western regulatory focus…
Our customers needed a better base model <10B parameters. We spent the last 5 months building one. I'm delighted to share a preview of our first Arcee Foundation Model: AFM-4.5B-Preview.
reasoning, which we define as “that thing you do when you’re reasoning”, [citation needed] is a phenomenon with wide-ranging applications in fields such as medicine, B2B software sales, and finance,
slightly more serious take: systematic research into what llms are capable of, how close to "thought" they get, and where they fail is cool and good. what's not is the incurious, borderline solipsistic dismissal of even the possibility of non-human reasoning through word games
🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning! This is paradigm-shifting. A MUST-READ. Full breakdown below 👇 🧵 1/23

Going to be fun. (I'll be there as a judge.)
Announcing the Nous RL Environments Hackathon in SF! Create with Atropos, Nous' RL environments framework, and claim your stake of a $50,000 prize pool. Partners - @xai @nvidia @nebiusai @SHACK15sf @akashnet_ @LambdaAPI @tensorstax and @runpod_io May 18th. Sign up below 👇👇
o3 is an odd combination of impressively capable and straight-up incoherent. makes connections i haven't seen a model make before, but also says stuff like "editorial nit: version string says v0.4 in the header, but the title says v0.4" and conflates authorship of turns
Exactly the kind of shenanigans we should all be up to
❗️Attention is NOT all you need ❗️ Using only 8 GPU's (not a cluster), we trained a Qwerky-72B (and 32B), without any transformer attention With evals far surpassing GPT 3.5 turbo, and closing in on 4o-mini. All with 100x++ lower inference cost, via RWKV linear scaling
okay is it really necessary to dunk on the western ml software stack this hard
🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min…
you've finished researching and advertising about model merging and get to appreciate the result. it became mainstream, a key tool in the toolbox of any post-training pipeline
you've finished model merging and get to appreciate the result
Fun release today. If you want to read some rambling about how we built this cutting-edge 14B model, check out the article.
First came Arcee AI's flagship 70B model, 𝗦𝘂𝗽𝗲𝗿𝗡𝗼𝘃𝗮, followed by the 𝟴𝗕 𝗦𝘂𝗽𝗲𝗿𝗡𝗼𝘃𝗮-𝗟𝗶𝘁𝗲. Today we add to this family of superpower Small Language Models (SLMs) with the release of the 14B 𝗦𝘂𝗽𝗲𝗿𝗡𝗼𝘃𝗮-𝗠𝗲𝗱𝗶𝘂𝘀. SuperNova-Medius represents a…