Acyr Locatelli

@acyr_l

Lead pre-training @Cohere

London

Joined May 2011

908Following

590Followers

Pinned

Acyr Locatelli Retweeted

Machine Learning Street Talk@MLStreetTalk · Jun 5

Dropping tomorrow on MLST - the serious problems with Chatbot Arena. We will talk about the recent investment and the explosive paper from Cohere researchers which identified several significant problems with the benchmark.

109

5.0K

Acyr Locatelli Retweeted

Sylvie Shi@sylvieshi00 · Jul 23

Our team is hiring! Please consider applying if you care deeply about data, and want to train sick base models :)) jobs.ashbyhq.com/cohere/859e2e4…

325

250

29.0K

Acyr Locatelli Retweeted

elie@eliebakouch · Jul 21

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due…

395

191

31.0K

Acyr Locatelli Retweeted

Laura Ruis@LauraRuis · Jun 24

LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.

316

252

33.0K

Acyr Locatelli Retweeted

Nathan Herr@naitherr · Jun 17

Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed…

143

127

21.0K

Acyr Locatelli Retweeted

Machine Learning Street Talk@MLStreetTalk · Jun 16

We are running our first physical event in London on 14th July! We have Tim Nguyen @IAmTimNguyen from DeepMind and Max Bartolo @max_nlp from Cohere and Enzo Blindow (VP of Data, Research & Analytics) at @Prolific joining us. Not many seats for the first one.…

9.0K

Acyr Locatelli Retweeted

Adib@adibvafa · Jun 2

Introducing the world's first reasoning model in biology! 🧬 BioReason enables AI to reason about genomics like a biology expert. A thread 🧵:

256

1.0K

207.0K

Acyr Locatelli@acyr_l · May 20

If you are based in Zurich (or anywhere rly) and write code for ML accelerators (including cuda/rocm) HMU

AAlex McKinney@alexfmckinney · May 20

Cohere also has a CUDA team

2.0K

Acyr Locatelli Retweeted

outside five sigma@jwt0625 · May 18

new math youtube streamer just dropped

609

8.0K

4.0K

1.2M

Acyr Locatelli@acyr_l · Apr 25

Presenting this today 3-530 at poster #208, come say hi 🙋‍♀️

LLaura Ruis@LauraRuis · Nov 20

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️

12.0K

Acyr Locatelli@acyr_l · Apr 24

Very proud of this work which is being presented @iclr_conf later today. While I will not be there — Catch up with @viraataryabumi and @ahmetustun89 who are both fantastic and can share more about our work at both @Cohere_Labs and @cohere. 🔥✨

CCohere Labs@Cohere_Labs · Aug 21

In our latest work, we ask “what is the impact of code data used in pre-training on non-code tasks?” Work w @viraataryabumi, @yixuan_su, @rayhascode, @adrien_morisot, @1vnzh, @acyr_l, @mziizm, @ahmetustun89 @sarahookr 📜 arxiv.org/abs/2408.10914

7.0K

Acyr Locatelli Retweeted

Nando de Freitas@NandoDF · Apr 20

RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive…

196

1.0K

492

112.0K

Acyr Locatelli Retweeted

Cohere Labs@Cohere_Labs · Apr 16

Excited to announce that @Cohere and @Cohere_Labs models are the first supported inference provider on @huggingface Hub! 🔥 Looking forward to this new avenue for sharing and serving our models, including the Aya family and Command suite of models.

141

61.0K

Acyr Locatelli@acyr_l · Mar 19

Great interview! (even at 1x)

MMax Bartolo@max_nlp · Mar 19

I really enjoyed my @MLStreetTalk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth…

552

Acyr Locatelli@acyr_l · Mar 19

Highly recommend working with Ed! He comes in handy.

EEdward Grefenstette@egrefen · Mar 19

Highly recommend working with Sander. Also playing SC2 with Sander.

12.0K

Acyr Locatelli@acyr_l · Mar 18

I added @cohere command A to this chart, I had to extend the axis a bit though….

MMistral AI@MistralAI · Mar 17

Introducing Mistral Small 3.1. Multimodal, Apache 2.0, outperforms Gemma 3 and GPT 4o-mini. mistral.ai/news/mistral-s…

692

110

131.0K

Acyr Locatelli@acyr_l · Mar 17

🚀 Big news @cohere's latest Command A now climbs to #13 on Arena! Another organization joining the top-15 club - congrats to the Cohere team! Highlights: - open-weight model (111B) - 256K context window - $2.5/$10 input/output MTok More analysis👇

ccohere@cohere · Mar 13

We’re excited to introduce our newest state-of-the-art model: Command A! Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.

240

63.0K

Acyr Locatelli@acyr_l · Mar 14

shoutout to the ascii team

NNick Frosst@nickfrosst · Mar 14

ASCII ray tracing with Command A

3.0K

Acyr Locatelli@acyr_l · Mar 14

Really proud of the work that went into pre-training this model!

ccohere@cohere · Mar 13

We’re excited to introduce our newest state-of-the-art model: Command A! Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.

5.0K

Acyr Locatelli Retweeted

AK@_akhaliq · Mar 13

Cohere releases Command A on Hugging Face Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Command A is on par or better than models like GPT-4o and Deepseek…

257

25.0K