Kaiser Sun

@KaiserWhoLearns

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈. #NLProc

My fantasea

Joined May 2021

477Following

1KFollowers

Pinned

Kaiser Sun@KaiserWhoLearns · Jun 16

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

KaiserWhoLearns's tweet image. What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch

10.0K

Pinned

Kaiser Sun Retweeted

Niyati Bafna@BafnaNiyati · Jul 4

📢When LLMs solve tasks with a mid-to-low resource input/target language, their output quality is poor. We know that. But can we pin down what breaks inside the LLM? We introduce the 💥translation barrier hypothesis💥 for failed multilingual generation. arxiv.org/abs/2506.22724

4.0K

Kaiser Sun@KaiserWhoLearns · Jul 26

We want to set a SUPER high bar for OAI's open-source release 😉

NNVIDIA AI Developer@NVIDIAAIDev · Jul 26

📣 Announcing Llama Nemotron Super v1.5 📣 This release pushes the boundaries of reasoning model capabilities at the weight class of the model and is ready to power agentic applications from individual developers, all the way to enterprise applications. 📈 The Llama Nemotron…

2.0K

Kaiser Sun@KaiserWhoLearns · Jul 9

Tokenization is most likely the reason whenever I had a bug in my model 🫠

AAlbert Gu@_albertgu · Jul 8

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.

244

Kaiser Sun Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

170

599

204

139.0K

Kaiser Sun Retweeted

Nouha Dziri@nouhadziri · Jun 24

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found…

159

725

669

164.0K

Kaiser Sun Retweeted

Chenghao Yang@chrome1996 · Jun 24

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and…

14.0K

Kaiser Sun@KaiserWhoLearns · Jun 16

Our new paper explores knowledge conflict in LLMs. It also issues a word of warning to those using LLMs as a Judge: the model can't help but inject its own knowledge into its decisions.

KKaiser Sun@KaiserWhoLearns · Jun 16

4.0K

Kaiser Sun Retweeted

Niyati Bafna@BafnaNiyati · Jun 7

We know speech LID systems flunk on accented speech. But why? And what to do about it?🤔Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that a model relies on, and proposes a fix.

1.0K

Kaiser Sun Retweeted

Tiago Pimentel@tpimentelms · Jun 4

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!

130

17.0K

Kaiser Sun Retweeted

Alex Gill@alex_gill_nlp · Jun 4

𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha_nlp and @anmarasovic (Full link below 👇)

6.0K

Kaiser Sun Retweeted

Fangcong Yin@fangcong_y10593 · Jun 2

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

11.0K