Kaiqu Liang (@kaiqu_liang)

Pinned

K

Kaiqu Liang@kaiqu_liang · Jul 10

🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.

kaiqu_liang's tweet image. 🤔 Feel like your AI is bullshitting you? It’s not just you.

🚨 We quantified machine bullshit 💩

Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse!

🔥 Time to rethink AI alignment.

23

113

608

427

181.0K

Pinned

Kaiqu Liang Retweeted

H

Haimin Hu@HaiminHu · May 17

🗓️ Mark your calendar: The 1st #ICRA Workshop on Public Trust in Autonomous Systems (PTAS) is just two days away! We'll explore the critical question: How do we build assurances into autonomous technologies from the ground up, shaping public trust before widespread deployment?

1

2

6

0

473

K

Kaiqu Liang@kaiqu_liang · Jul 21

Awesome, I've been saying this for a while, inspired by @DrJohnVervaeke. LLMs are formally bullshitting, yes. medium.com/@balazskegl/on… A couple of threads that may be interesting: x.com/balazskegl/sta… x.com/NandoDF/status… The connection: when we speak, we have an…

KKaiqu Liang@kaiqu_liang · Jul 10

🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.

0

2

11

6

5.0K

Kaiqu Liang Retweeted

R

Ryan Liu @ ICML, CogSci@theryanliu · Jul 17

Chain of thought can hurt LLM performance 🤖 Verbal (over)thinking can hurt human performance 😵‍💫 Are when/why they happen similar? Come find out at our poster at West-320 ⏰11am tomorrow! #ICML2025

0

9

49

10

4.0K

K

Kaiqu Liang@kaiqu_liang · Jul 14

LLMs are picking up weird patterns from humans. Aligning them to be helpful actually teaches them to bullshit? Must be why I like Grok Unhinged the best. :-)

KKaiqu Liang@kaiqu_liang · Jul 10

🤔 Feel like your AI is bullshitting you? It’s not just you. 🚨 We quantified machine bullshit 💩 Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse! 🔥 Time to rethink AI alignment.

22

11

95

16

16.0K

Kaiqu Liang Retweeted

A

Alignment Lab AI@alignment_lab · Jul 11

arxiv.org/pdf/2507.07484 machine-bullshit.github.io Princeton University and UC Berkeley published a formalized analysis on the emergent dishonesty that rlhf at scale optimizes for in large language models, They provide a taxonomy and scoring system to allow for direct indexing,…

4

11

28

9.0K

Kaiqu Liang Retweeted

X

Xuandong Zhao@xuandongzhao · May 27

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

84

512

4.0K

540.0K

Kaiqu Liang Retweeted

J

John Yang@jyangballin · May 7

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

24

133

653

379

97.0K

Kaiqu Liang Retweeted

X

Xindi Wu@cindy_x_wu · May 2

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10

6

45

159

76

51.0K

K

Kaiqu Liang@kaiqu_liang · Apr 30

@OpenAI rolled back GPT-4o citing sycophancy—just as our research predicted. Short-term feedback teaches AI to sound nice... and systematically misaligns it! Our solution: RLHS, training AI with simulated hindsight feedback for long-term alignment! 👉 arxiv.org/abs/2501.08617

OOpenAI@OpenAI · Apr 30

We’ve rolled back last week's GPT-4o update in ChatGPT because it was overly flattering and agreeable. You now have access to an earlier version with more balanced behavior. More on what happened, why it matters, and how we’re addressing sycophancy: openai.com/index/sycophan…

0

6

8

4

1.0K