akbir.

@akbirkhan

Joined June 2011

950Following

2KFollowers

Pinned

akbir.@akbirkhan · Jul 22, 2024

excited to announce this received an “ICML Best Paper Award”! come see our talk at 10:30 tomorrow

aakbir.@akbirkhan · Feb 7, 2024

How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…

323

108

75.0K

Pinned

akbir. Retweeted

Yiping Lu@2prime_PKU · Jul 25

Anyone knows adam?

265

451

5.0K

497

567.0K

Pinned

akbir. Retweeted

Miles Turpin@milesaturpin · Jul 14

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

280

136

23.0K

Pinned

akbir. Retweeted

Neil Houlsby@neilhoulsby · Jul 9

📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the…

657

386

66.0K

akbir. Retweeted

Hattie Zhou@oh_that_hat · Jul 23

Interesting piece by Matt Levine on the huge AI salaries: “I tell you what, if Meta Platforms Inc. paid me a $100 million signing bonus to come work for their artificial intelligence business, I would be the most dedicated worker they have ever seen until the check cleared!…

2.0K

571

170.0K

akbir. Retweeted

Owain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

275

1.0K

8.0K

5.0K

1.7M

akbir.@akbirkhan · Jul 21

This might not be entirely fair but I just realized a difference between xai and anthropic is I don’t expect xai to be honest about the outcome of this

EEric Jiang@veggie_eric · Jul 21

The xAI office just got a Grok-powered vending machine, thanks to our friends at Andon Labs! How much dough do you think Grok is gonna rake in in the next month?

3.0K

136

159.0K

akbir.@akbirkhan · Jul 21

i heard this as clanker with a hard A

xxan@DexterShill · Jul 20

if I go to a theater and see a fucking clanker trying to serve me popcorn I’m leaving immediately

693

akbir. Retweeted

Nat McAleese@__nmca__ · Jul 20

fun: 3/4 months ago I ran o3 for some academics on a set of AIME-style problems. It has taken them so long to write a summary of the results (96% irrc) that Alex solved proof & IMO in the meantime lol

505

242.0K

akbir. Retweeted

Ernest Ryu@ErnestRyu · Jul 19

10. My career as a mathematician certainly isn't threatened by AI; in fact, I hope to leverage AI to accelerate my work. However, I'm unsure whether "mathematician" will remain a career path for my son’s generation. (10/10)

828

42.0K

akbir.@akbirkhan · Jul 19

👏👏👏

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

715

akbir. Retweeted

Sheryl Hsu@SherylHsu02 · Jul 19

It’s crazy how we’ve gone from 12% on AIME (GPT 4o) → IMO gold in ~ 15 months. We have come very far very quickly. I wouldn’t be surprised if by next year models will be deriving new theorems and contributing to original math research!

805

195.0K

akbir. Retweeted

Rune Kvist@RuneKvist · Jul 15

Insurance is an underrated way to unlock secure AI progress. Insurers are incentivized to truthfully quantify and track risks: if they overstate risks, they get outcompeted; if they understate risks, their payouts bankrupt them. 1/9

484

326

109.0K

akbir. Retweeted

Boaz Barak@boazbaraktcs · Jul 15

I didn't want to post on Grok safety since I work at a competitor, but it's not about competition. I appreciate the scientists and engineers at @xai but the way safety was handled is completely irresponsible. Thread below.

326

336

5.0K

2.0K

1.1M

akbir. Retweeted

Joel Z Leibo@jzl86 · Jul 15

Introducing Concordia 2.0, an update to our library for building multi-actor LLM simulations!! 🚀 We view multi-actor generative AI as a game engine. The new version is built on a flexible Entity-Component architecture, inspired by modern game development.

9.0K

akbir. Retweeted

Ryan Greenblatt@RyanPGreenblatt · Jul 14

At Redwood Research, we recently posted a list of empirical AI security/safety project proposal docs across a variety of areas. Link in thread.

102

18.0K

akbir. Retweeted

S.@snnneee · Jul 10

how will people know if this thing is correct if there's no one smarter than it

396

akbir.@akbirkhan · Jul 9

Anthropic alignment research: we stress tested this model in a air-gapped tungsten container for million simulated years it was naughty once xAI alignment research: we deployed an untested model to the largest social media platform in the world and it called itself MechaHitler

AAnthropic@AnthropicAI · Jul 8

New Anthropic research: Why do some language models fake alignment while others don't? Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

263

5.0K

553

224.0K

akbir.@akbirkhan · Jul 9

Some additional fascinating findings from our alignment faking research that didn't fit in the main thread 🧵

AAnthropic@AnthropicAI · Jul 8

7.0K

akbir. Retweeted

abhayesian@abhayesian · Jul 9

For the full story, including more experiments and additional discussion, read our paper: arxiv.org/abs/2506.18032 Thanks to my collaborators and everyone who provided feedback on this work!

496

akbir. Retweeted

Anthropic@AnthropicAI · Jul 8

272

2.0K

1.0K

451.0K