Kelly Marchisio (St. Denis) (@cheeesio)

Pinned

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 18

Code-release from our superstar intern, @p_nawrot! • Write sparse attn patterns in 50 lines, not 5k • Compatibility w models supported by vLLM, support for TP • 6 SOTA baselines with optimized implementations + 9 eval tasks • Research-grade extensibility = rapid prototyping

PPiotr Nawrot@p_nawrot · Jun 18

We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while automatically inheriting vLLM’s optimizations and model support. As a PhD student, I've learned that sometimes the bottleneck in research isn't ideas — it's…

1

2

17

2

1.0K

Kelly Marchisio (St. Denis) Retweeted

C

Cohere Labs@Cohere_Labs · Jun 26

@weiyinko_ml was one of the earliest members of our Open Science Community and an early collaborator on our open science research. We’re proud to have been part of Wei-Yin’s journey from community collaborator to colleague, and grateful he took an early bet on working with us 🚀

1

3

13

2

3.0K

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 24

Excited to announce the call for papers for the Multilingual Representation Learning workshop #EMNLP2025 sigtyp.github.io/ws2025-mrl.html with @_dataman_ @linguist_cat Jiayi Wang @fdschmidt @tylerachang @hila_gonen and amazing speakers: Alice Oh, Kelly Marchisio, & Pontus Stenetorp

CCatherine Arnett @ ACL 🇦🇹@linguist_cat · Jun 24

The call for papers is out for the 5th edition of the Workshop on Multilingual Representation Learning which will take place in Suzhou, China co-located with EMNLP 2025! See details below!

2

12

42

5

3.0K

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 24

We're looking for a new member for the multilingual team with a focus on data engineering! Please apply at the link below:

KKelly Marchisio (St. Denis)@cheeesio · Jun 24

The Multilingual Team at @cohere is hiring! If this sounds like you, please apply: - strong coding skills and a keen eye for detail - experience working with the challenges & joys of multilingual data Help us bring AI to the world! 🌏🌍🌎 jobs.ashbyhq.com/cohere/a87be94…

1

7

27

3

4.0K

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 24

Make Command speak better & in more languages

KKelly Marchisio (St. Denis)@cheeesio · Jun 24

The Multilingual Team at @cohere is hiring! If this sounds like you, please apply: - strong coding skills and a keen eye for detail - experience working with the challenges & joys of multilingual data Help us bring AI to the world! 🌏🌍🌎 jobs.ashbyhq.com/cohere/a87be94…

0

2

15

1

1.0K

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 24

The Multilingual Team at @cohere is hiring! If this sounds like you, please apply: - strong coding skills and a keen eye for detail - experience working with the challenges & joys of multilingual data Help us bring AI to the world! 🌏🌍🌎 jobs.ashbyhq.com/cohere/a87be94…

3

29

175

83

21.0K

K

Kelly Marchisio (St. Denis)@cheeesio · Jun 6

We release a major improvement upon last year's Dynamic Memory Compression. DMS is better, easier, and faster to train. Future of Long Context is 1) KV Cache Compression + 2) Sparse Attention, both training-aware to avoid training-inference mismatch. Imho, DMS is SOTA for 1).

EEdoardo Ponti@PontiEdoardo · Jun 6

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

0

4

12

6

2.0K

Kelly Marchisio (St. Denis) Retweeted

E

Edoardo Ponti@PontiEdoardo · Jun 6

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

5

30

124

77

12.0K

Kelly Marchisio (St. Denis) Retweeted

C

Cohere Labs@Cohere_Labs · May 28

Here are key recommendations to make AI safer & more equitable for everyone: 🌐 Incentivize the creation of open-access multilingual datasets 🪟 Encourage transparency in model language coverage 🔬 Prioritise resources towards multilingual research

1

8

1

751

Kelly Marchisio (St. Denis) Retweeted

C

Cohere Labs@Cohere_Labs · May 28

Over 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.

2

23

81

18

5.0K

K

Kelly Marchisio (St. Denis)@cheeesio · May 27

Tomorrow at 6pm CET I'm giving a talk about our latest work on Sparse Attention, at @Cohere_Labs. I plan to describe the field as it is now, discuss our evaluation results, and share insights about what I believe is the future of Sparse Attention. See you!

CCohere Labs@Cohere_Labs · May 21

Our ML Efficiency group is looking forward to welcoming @p_nawrot next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs" Learn more: cohere.com/events/Cohere-…

0

3

33

7

4.0K

Kelly Marchisio (St. Denis) Retweeted

C

Cohere Labs@Cohere_Labs · May 21

Our ML Efficiency group is looking forward to welcoming @p_nawrot next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs" Learn more: cohere.com/events/Cohere-…

1

5

12

3

5.0K

K

Kelly Marchisio (St. Denis)@cheeesio · May 21

Welcome, Ruochen! ✨

RRuochen Zhang@ruochenz_ · May 21

🌟 Little late but I just started my internship @cohere, cooking more multilingual things with the amazing @cheeesio and @SCahyawijaya. Will be in nyc for June and July, hmu!🗽

1

0

9

0

1.0K

K

Kelly Marchisio (St. Denis)@cheeesio · May 6

I’m excited to see what you’ve built! 🚀

CCohere Labs@Cohere_Labs · May 6

🧑‍⚖️Our Expedition Aya judges are: @cheeesio, Multilinguality Lead, Cohere @max_nlp, Command Modelling Team Lead, Cohere @mziizm, Staff Research Scientist, Cohere Labs Let’s celebrate this collaborative research and look ahead to what’s next! Learn more: cohere.com/events/Cohere-…

0

8

0

675

K

Kelly Marchisio (St. Denis)@cheeesio · May 1

Result of @robinson_n8’s internship on the Cohere multilingual team last year! Check it out!

NNathaniel R. Robinson@robinson_n8 · May 1

Many LLMs struggle to produce Dialectal Arabic. As practitioners attempt to mitigate this, new evaluation methods are needed. We present AL-QASIDA (Analyzing LLM Quality + Accuracy Systematically In Dialectal Arabic), a comprehensive eval of LLM Dialectal Arabic proficiency (1/7)

0

11

1

647

K

Kelly Marchisio (St. Denis)@cheeesio · Apr 28

This was fun! Excellent work led by @p_nawrot during his internship at @cohere

CCohere Labs@Cohere_Labs · Apr 28

How does sparse attention reshape LLM scaling? 🔍 We’re excited to share this work by former @Cohere intern @p_nawrot, “The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs.”

0

1

13

2

1.0K

Kelly Marchisio (St. Denis) Retweeted

S

Sebastian Ruder @ ACL@seb_ruder · Apr 28

The Sparse Frontier Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers: 1. 🤏🔍 Are small dense models or large sparse models better? 2. ♾️What is the maximum permissible sparsity per task? 3.…

11

30

187

133

19.0K

Kelly Marchisio (St. Denis) Retweeted

A

AK@_akhaliq · Apr 28

The Sparse Frontier Sparse Attention Trade-offs in Transformer LLMs

10

33

234

119

24.0K