Kris Cao
@kroscoo
When lava pours out near the sea's surface tremendous volcanic explosions sometimes occur | pretraining @cohere
🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.
Our team is hiring! Please consider applying if you care deeply about data, and want to train sick base models :)) jobs.ashbyhq.com/cohere/859e2e4…
Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian (and frequentist!) methods you should be using instead. Super lightweight library: github.com/sambowyer/baye… 🧵👇
Full house at the @tokshop2025 tokenization workshop at #ICML2025 today!

How do Americans know what being stuck in second gear feels like when they mostly drive automatic.
Going to #ICML2025 next week to be a panellist at @tokshop2025, looking forward to catching up with old friends and meeting new ones. My spicy take for the conference is that I think code generation has a lot to learn from the (syntactic) parsing literature.
🚨New pretraining paper on multilingual tokenizers 🚨 Super excited to share my work with @Cohere_Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
How can we make language models more flexible to adapt to new languages after pretraining? 🌏 🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.
rewinding model loss spikes is like saving before a difficult boss and reloading if you start losing
If you are based in Zurich (or anywhere rly) and write code for ML accelerators (including cuda/rocm) HMU
Cohere also has a CUDA team
the acl paper template will long outlive acl as an attractive destination for NLP papers
Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯
in case you are looking for the best model for COBOL, we might be able to help you here... cohere.com/research/paper…
Social Security systems contain tens of millions of lines of code written in COBOL, an archaic programming language. Safely rewriting that code would take years—DOGE wants it done in months. wired.com/story/doge-reb…
I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised…
My team recently launched a best-in-class LLM specializing in English and Arabic. We just published a tech report explaining our methods. Check it out on arxiv: arxiv.org/abs/2503.14603
we have a new model, it's pretty good and we like it, we think you'll like it too. (as an aside this is the first model i contributed to at cohere!)
We’re excited to introduce our newest state-of-the-art model: Command A! Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.
Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿
Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇
I'm hiring performance engineers for the pre-training team at Cohere. If you enjoy writing efficient kernels, hardware-aligned architecture design and optimisations, do reach out! Check out the live job posting here: jobs.ashbyhq.com/cohere/d42f5fd…