Kris Cao

@kroscoo

When lava pours out near the sea's surface tremendous volcanic explosions sometimes occur | pretraining @cohere

Joined May 2016

731Following

2KFollowers

Pinned

Kris Cao Retweeted

Tokenization Workshop (TokShop) @ICML2025@tokshop2025 · Jul 15

🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.

6.0K

Kris Cao Retweeted

Sylvie Shi@sylvieshi00 · Jul 23

Our team is hiring! Please consider applying if you care deeply about data, and want to train sick base models :)) jobs.ashbyhq.com/cohere/859e2e4…

325

253

28.0K

Kris Cao Retweeted

Sam Bowyer@sambowyer__ · Mar 4

Our paper on the best way to add error bars to LLM evals is on arXiv! TL;DR: Avoid the Central Limit Theorem -- there are better, simple Bayesian (and frequentist!) methods you should be using instead. Super lightweight library: github.com/sambowyer/baye… 🧵👇

16.0K

Kris Cao@kroscoo · Jul 18

Full house at the @tokshop2025 tokenization workshop at #ICML2025 today!

1.0K

Kris Cao Retweeted

Kris Cao@kroscoo · Nov 7, 2023

How do Americans know what being stuck in second gear feels like when they mostly drive automatic.

2.0K

Kris Cao@kroscoo · Jul 13

Going to #ICML2025 next week to be a panellist at @tokshop2025, looking forward to catching up with old friends and meeting new ones. My spicy take for the conference is that I think code generation has a lot to learn from the (syntactic) parsing literature.

921

Kris Cao Retweeted

Diana Abagyan@dianaabagyan · Jun 16

🚨New pretraining paper on multilingual tokenizers 🚨 Super excited to share my work with @Cohere_Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

102

14.0K

Kris Cao Retweeted

Cohere Labs@Cohere_Labs · Jun 13

How can we make language models more flexible to adapt to new languages after pretraining? 🌏 🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.

5.0K

Kris Cao@kroscoo · May 27

rewinding model loss spikes is like saving before a difficult boss and reloading if you start losing

625

Kris Cao@kroscoo · May 20

If you are based in Zurich (or anywhere rly) and write code for ML accelerators (including cuda/rocm) HMU

AAlex McKinney@alexfmckinney · May 20

Cohere also has a CUDA team

2.0K

Kris Cao@kroscoo · Apr 21

the acl paper template will long outlive acl as an attractive destination for NLP papers

879

Kris Cao Retweeted

omer goldman@omerNLP · Apr 1

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

3.0K

Kris Cao@kroscoo · Mar 31

in case you are looking for the best model for COBOL, we might be able to help you here... cohere.com/research/paper…

WWIRED@WIRED · Mar 28

Social Security systems contain tens of millions of lines of code written in COBOL, an archaic programming language. Safely rewriting that code would take years—DOGE wants it done in months. wired.com/story/doge-reb…

5.0K

Kris Cao Retweeted

Max Bartolo@max_nlp · Mar 27

I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised…

276

123

83.0K

Kris Cao Retweeted

Kyle Duffy@kyduffy · Mar 20

My team recently launched a best-in-class LLM specializing in English and Arabic. We just published a tech report explaining our methods. Check it out on arxiv: arxiv.org/abs/2503.14603

6.0K

Kris Cao@kroscoo · Mar 15

thinking about tree lstms again

471

Kris Cao@kroscoo · Mar 13

we have a new model, it's pretty good and we like it, we think you'll like it too. (as an aside this is the first model i contributed to at cohere!)

ccohere@cohere · Mar 13

We’re excited to introduce our newest state-of-the-art model: Command A! Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.

1.0K

Kris Cao Retweeted

Cohere Labs@Cohere_Labs · Mar 4

Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿

126

463

167

203.0K

Kris Cao Retweeted

Michael Hu ✈️ ACL 2025 🇦🇹@michahu8 · Feb 27

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

107

713

481

80.0K

Kris Cao Retweeted

Acyr Locatelli@acyr_l · Feb 26

I'm hiring performance engineers for the pre-training team at Cohere. If you enjoy writing efficient kernels, hardware-aligned architecture design and optimisations, do reach out! Check out the live job posting here: jobs.ashbyhq.com/cohere/d42f5fd…

153

14.0K