Aryo Pradipta Gema

@aryopg

AI Safety Fellow @Anthropic | PhD student @BioMedAI_CDT @EdinburghNLP @EdiClinicalNLP LLM Hallucinations | Clinical NLP | Opinions are my own.

London

Joined August 2010

2KFollowing

1KFollowers

Pinned

Aryo Pradipta Gema@aryopg · Jul 22

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

aryopg's tweet image. New Anthropic Research: “Inverse Scaling in Test-Time Compute”

We found cases where longer reasoning leads to lower accuracy.
Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns.

🧵

815

468

114.0K

Pinned

Aryo Pradipta Gema Retweeted

Miles Turpin@milesaturpin · Jul 14

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

277

136

22.0K

Aryo Pradipta Gema@aryopg · Jul 16

Catch Neel if you're attending #ICML2025 !! 🚀🚀🚀

NNeel Rajani @ICML'25@NeelRajani_ · Jul 16

🚨New paper alert!🚨 "Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them" @ActInterp ICML'25 @deepseek_ai popularised RLVR and distillation for 'reasoning training'! But how do they differ under the hood? Details in 🧵: (1/8)

265

Aryo Pradipta Gema Retweeted

Neel Rajani @ICML'25@NeelRajani_ · Jul 15

Finally made it to @icmlconf in gorgeous Vancouver! Presenting work at @ActInterp on Saturday (more on that soon 👀). If you're into interpretability/RL/AI Safety, I'd love to chat :)

3.0K

Aryo Pradipta Gema@aryopg · Jul 12

Results on MMLU-Redux (arxiv.org/abs/2406.04127, NAACL'25), our manually curated and error-free subset of MMLU, are super strong as well!

KKimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

2.0K

Aryo Pradipta Gema@aryopg · Jul 8

We shed some light on why some models fake alignment and find Claude 3 Opus has unique motivations. Big thanks to @FabienDRoger @abhayesian and other collaborators!

AAnthropic@AnthropicAI · Jul 8

New Anthropic research: Why do some language models fake alignment while others don't? Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

688

Aryo Pradipta Gema@aryopg · May 29

The methods we used to trace the thoughts of Claude are now open to the public! Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.

AAnthropic@AnthropicAI · May 29

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

178

2.0K

1.0K

179.0K

Aryo Pradipta Gema@aryopg · May 29

@mntssys and I are excited to announce circuit-tracer, a library that makes circuit-finding simple! Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on @neuronpedia: shorturl.at/SUX2A

AAnthropic@AnthropicAI · May 29

209

56.0K

Aryo Pradipta Gema Retweeted

Anthropic@AnthropicAI · May 22

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

952

3.0K

21.0K

4.0K

4.1M

Aryo Pradipta Gema Retweeted

Emile van Krieken@EmilevanKrieken · May 21

We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning 🚀 Read more 👇

105

574

454

47.0K

Aryo Pradipta Gema Retweeted

Zhaowei Wang@ZhaoweiWang4 · May 21

🚨 New paper! 🚨 Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? 🤔 💡In our new paper, we find that most models still fall short! We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:…

3.0K

Aryo Pradipta Gema@aryopg · May 2

Featuring the one and only @nickilmaveli! 😊

AAryo Pradipta Gema@aryopg · May 2

MMLU-Redux just touched down at #NAACL2025! 🎉 Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅 If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

3.0K