Philipp Schoenegger
@SchoeneggerPhil
Advanced Planning Unit, @Microsoft AI
Some personal news! Next month I will be joining Microsoft AI, working on the economic effects of advanced AI. After an amazing time at LSE, I'm really excited to contribute to this important area of research at Microsoft during such a pivotal moment for AI!

Prompt engineering has negligible and sometimes negative effects on models' ability to forecast. I feel like this reflects decreasing benefit of prompt engineering as models get more sophisticated, but it could simply mean that we haven't yet discovered a good forecasting prompt.
New preprint with @CamrobJones @PTetlock and Mellers! We test how much prompt engineering can impact LLM forecasting capabilities of o1, o1-mini, 4o, Sonnet, Haiku & Llama, finding that simple-to-moderate prompt engineering has little or no effect, with some prompts backfiring!
Coupled with other results like @SchoeneggerPhil's other recent paper on the effectiveness of RL for improving forecasting it seems like general prompting might not be the most promising place to push atm. arxiv.org/abs/2505.17989
Really enjoyed working on this with @SchoeneggerPhil @PTetlock and Barbara Mellers. We tried ~50 prompt techniques (including AI classics and more theory-motivated ones) on 100 forecasting questions across 6 LLMs. No prompt showed robust improvements! arxiv.org/abs/2506.01578
New preprint with @CamrobJones @PTetlock and Mellers! We test how much prompt engineering can impact LLM forecasting capabilities of o1, o1-mini, 4o, Sonnet, Haiku & Llama, finding that simple-to-moderate prompt engineering has little or no effect, with some prompts backfiring!
This study is important because it showed that AI outperforms incentivized humans when persuading people of both true and false claims Persuasion has always been an extremely valuable skill in business, politics, and most other competitive domains arxiv.org/abs/2505.09662
💰💰💰Prediction markets are going to get weird. Now we have a smallish open source LLM (14B) that can be trained to predict messy real-world outcomes better than GPT o1. ~thread~ arxiv.org/abs/2505.17989
Outcome-Based Reinforcement Learning to Predict the Future arxiv.org/abs/2505.17989 (news.ycombinator.com/item?id=441068…)
Interesting work here on a 14b LLM for PM forecasting 👇 Check the adapted RL in particular. Nice results on calibration error. This open the door for production tools in the domain imo. Bravo to the team ! 👏 2nd interesting work in a few months on this underrated topic.
New preprint! We trained a 14B LLM on 110k yes/no events with outcome-only RL (ReMax + GRPO tweaks), matching frontier model o1 on accuracy and halving its calibration error, yielding a hypothetical $127 vs $92 profit (+$35).
🚨 New preprint from @lightningrodai, in collaboration with @SchoeneggerPhil & @lukebeehewitt 🚨 We trained a compact reasoning model that's state-of-the-art at predicting the future. We massively outperform frontier models at prediction market betting, despite being a fraction…
I also have another preprint out with @SchoeneggerPhil et al. showing similar results on Claude Sonnet 3.5 in interactive quizzes with highly incentivised humans, both in truthful and deceptive persuasion. More on this at: x.com/SchoeneggerPhi…
New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!
Proud to be a small part of this fantastic team -- check out our pre-print on #LLM #persuasion at arxiv.org/abs/2505.09662.
New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!
AI's are significantly better at persuasion than cash-incentivized humans in a real-time conversational quiz setting, research by @schoeneggerphil et al finds—both when truthful and deceptive (steering towards right vs wrong answers): buff.ly/Lu4TNvF
👀👀👀 academic research hinting at AI better than humans at sales "Large Language Models Are More Persuasive Than Incentivized Human Persuaders" arxiv.org/abs/2505.09662
This is a must-read paper. Everyone fears the use of LLMs to deceive and scam humans. This paper measures how good they are at it. (Spoiler: They're more persuasive than our fellow humans.) thread arxiv.org/abs/2505.09662
It was a great experience to be a part of this team, thanks for your amazing leadership @SchoeneggerPhil , here you can find our study about human vs. ai persuasion 👇
New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!