Shivalika Singh (@singhshiviii)

Pinned

S

Shivalika Singh@singhshiviii · Apr 30

LMArena is widely used for model evaluation, but is it measuring true progress? 🔮 In our work, "The Leaderboard Illusion", we reveal: 🔒 Private testing 📊 Data access asymmetries ⚠️ Overfitting risks 🚫 Silent deprecations Despite best intentions, arena policies favor a few!

singhshiviii's tweet image. LMArena is widely used for model evaluation, but is it measuring true progress? 🔮

In our work, "The Leaderboard Illusion", we reveal:
🔒 Private testing
📊 Data access asymmetries
⚠️ Overfitting risks
🚫 Silent deprecations

Despite best intentions, arena policies favor a few!

9

39

191

43

47.0K

S

Shivalika Singh@singhshiviii · Jul 23

Can we update model behavior immediately based on granular feedback from users? I think this is part of an important big picture direction — moves user feedback from cumbersome ask to adaptable in place learning. Work led by @RivardLuke and @yuntiandeng 🔥✨

LLuke Rivard@RivardLuke · Jul 23

🚀Introducing Chat Annotator— a free chatbot where users can highlight parts of responses, leave a comment, and have the model incorporate that feedback into its next output. Powered by Cohere Command-A. 👉Try it here: chatannotator.com

2

6

47

15

4.0K

S

Shivalika Singh@singhshiviii · Jul 22

I’m very excited to be co-organizing this @NeurIPSConf workshop on LLM evaluations! Evaluating LLMs is a complex and evolving challenge. With this workshop, we hope to bring together diverse perspectives to make real progress. See the details below:

LLLM Evals Workshop @NeurIPS@LLM_eval · Jul 22

We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3

1

13

52

11

5.0K

Shivalika Singh Retweeted

O

Ollie Cox@Ollie575563753 · Jul 21

@cohere packing only the essential items for @aclmeeting 2025

1

3

55

1

3.0K

Shivalika Singh Retweeted

C

Cohere Labs@Cohere_Labs · Jul 21

There’s less than one week to go until @aclmeeting in Vienna, Austria! 🇦🇹 The Cohere Labs and @Cohere research teams are looking forward to showcasing some of our latest research and connecting with the community. Be sure to stop by our booth and say hello!

1

3

33

1

8.0K

S

Shivalika Singh@singhshiviii · Jul 21

We have an incredible roster of accepted papers at @aclmeeting 2025. I will be there, as will many of our senior and engineering staff @mziizm @beyzaermis @mrdanieldsouza @singhshiviii 🔥 Looking forward to catching up with everyone.

CCohere Labs@Cohere_Labs · Jul 21

There’s less than one week to go until @aclmeeting in Vienna, Austria! 🇦🇹 The Cohere Labs and @Cohere research teams are looking forward to showcasing some of our latest research and connecting with the community. Be sure to stop by our booth and say hello!

5

11

79

4

6.0K

Shivalika Singh Retweeted

S

Sara Hooker@sarahookr · Jul 19

Sometimes it is important to take a moment and celebrate -- we achieved all of this in 3 years. Pretty incredible impact from @Cohere_Labs 🔥

9

25

166

14

15.0K

S

Shivalika Singh@singhshiviii · Jul 18

More about the project here, from first author @singhshiviii: x.com/singhshiviii/s…

SShivalika Singh@singhshiviii · Feb 13, 2024

I joined this project a year ago with just the hope of getting a glimpse of AI research. I had no idea it would end up becoming such a special collaboration of 3000+ people from all over the world! Looking back, I could not have asked for a better introduction to research :)

0

2

8

0

2.0K

Shivalika Singh Retweeted

S

Sara Hooker@sarahookr · Jul 18

This is one of my favorite sections in the Aya dataset paper. It is towards the end of the paper, so probably isn't read often. It speaks to how the end breakthrough was completely intertwined with the geo-reality experienced by independent researchers around the world.

10

7

80

22

5.0K

S

Shivalika Singh@singhshiviii · Jun 26

🚨New Recipe just dropped! 🚨 "LLMonade 🍋" ➡️ squeeze max performance from your multilingual LLMs at inference time !👀🔥 🧑‍🍳@ammar__khairi shows you how to 1⃣ Harvest your Lemons 🍋🍋🍋🍋🍋 2⃣ Pick the Best One 🍋

AAmmar Khairi@ammar__khairi · Jun 26

🚀 Want better LLM performance without extra training or special reward models? Happy to share my work with @Cohere_labs : "When Life Gives You Samples: Benefits of Scaling Inference Compute for Multilingual LLMs" 👀How we squeeze more from less at inference 🍋, details in 🧵

0

6

21

1

3.0K

Shivalika Singh Retweeted

A

AK@_akhaliq · Jun 26

When Life Gives You Samples The Benefits of Scaling up Inference Compute for Multilingual LLMs

2

29

145

64

20.0K

Shivalika Singh Retweeted

A

Ammar Khairi@ammar__khairi · Jun 26

🚀 Want better LLM performance without extra training or special reward models? Happy to share my work with @Cohere_labs : "When Life Gives You Samples: Benefits of Scaling Inference Compute for Multilingual LLMs" 👀How we squeeze more from less at inference 🍋, details in 🧵

2

20

35

10

7.0K

Shivalika Singh Retweeted

C

Cohere Labs@Cohere_Labs · Jun 25

In just 3 years, we’ve published 95 papers through Cohere Labs — with contributions and collaboration from over 60 institutions. These papers span topics from core ML research topics and reflect what’s possible when researchers come together to explore the unknown, together.

2

9

50

5

3.0K

Shivalika Singh Retweeted

M

Machine Learning Street Talk@MLStreetTalk · Jun 25

There are a limited number of spaces left for our first physical event in London on 14th July. Act quickly if you'd like to attend! We have Tim Nguyen (@IAmTimNguyen) from DeepMind and Max Bartolo (@max_nlp) from Cohere and Enzo Blindow (VP of Data, Research & Analytics) from…

4

6

21

1

4.0K

Shivalika Singh Retweeted

C

Cohere Labs@Cohere_Labs · Jun 19

Let's get studious. 🏫 This July join the Cohere Labs Open Science Community for ML Summer School. You'll be part of a global community exploring the future of ML and hear from speakers across the industry. Register to be first to hear about the line-up & connect with others.

11

42

175

164

35.0K

S

Shivalika Singh@singhshiviii · Jun 19

Worth reading this research which showed it has already been turned into cheat-slop and that meta was one of the worst culprits for gaming it x.com/singhshiviii/s…

SShivalika Singh@singhshiviii · Apr 30

LMArena is widely used for model evaluation, but is it measuring true progress? 🔮 In our work, "The Leaderboard Illusion", we reveal: 🔒 Private testing 📊 Data access asymmetries ⚠️ Overfitting risks 🚫 Silent deprecations Despite best intentions, arena policies favor a few!

1

7

25

6

7.0K

Shivalika Singh Retweeted

D

Daniel D'souza  @ ACL ✈️🇦🇹@mrdanieldsouza · Jun 18

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at @Cohere_Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️

1

17

36

9

6.0K

S

Shivalika Singh@singhshiviii · Jun 18

Can we train models for better inference-time control instead of over-complex prompt engineering❓ Turns out the key is in the data — adding fine-grained markers boosts performance and enables flexible control at inference🎁 Huge congrats to @mrdanieldsouza for this great work

DDaniel D'souza  @ ACL ✈️🇦🇹@mrdanieldsouza · Jun 18

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at @Cohere_Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️

1

8

19

2

2.0K

Shivalika Singh Retweeted

C

Cohere Labs@Cohere_Labs · Jun 18

🤹 How do we move away from complicated and brittle prompt engineering at inference for under-represented tasks?🤔 🧠 Our latest work finds that optimizing training protocols improves controllability and boosts performance on underrepresented use cases at inference time 📈

2

13

28

8

4.0K

S

Shivalika Singh@singhshiviii · Jun 18

Thanks @_akhaliq for the spotlight on our work I really believe strongly in this wider direction — of taking the pressure off everyday users to be master prompt engineers and inferring controllability directly from tasks.

AAK@_akhaliq · Jun 18

Cohere presents Treasure Hunt Real-time Targeting of the Long Tail using Training-Time Markers

5

10

64

14

16.0K

Shivalika Singh Retweeted

D

Diana Abagyan@dianaabagyan · Jun 16

🚨New pretraining paper on multilingual tokenizers 🚨 Super excited to share my work with @Cohere_Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

3

33

102

29

14.0K