Seth Karten
@sethkarten
Autonomous Agents | CS PhD @Princeton | Simulation @Waymo | Former @SCSatCMU @Amazon | @NSF GRFP Fellow
🚀 New preprint! 🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓

Shoutout to all the @Princeton researchers participating in @icmlconf #ICML2025 Browse through some of the cutting edge research from AI Lab students, post-docs and faculty being presented this year: pli.princeton.edu/blog/2025/prin…
👾Are you interested in LLMs for two-player competitive games with partial information? Or perhaps just a Pokemon fan? Come check out our #ICML spotlight poster at 4:30PM in West Exhibition Hall B2-B3 #W-815
🚀 6 days until my ICML spotlight poster! Key insights we’ll unpack: • Base LLM + test-time planning • Game-theoretic scaffolding • Context-engineered opponent prediction • Comparative LLM-as-judge (relative > absolute) Catch me Thu Jul 17, 4:30-7 PM PT👇
We’re proud that PLI students, post-docs, and faculty will be featuring over 20 papers at the @icmlconf in Vancouver this week! From safer AI agents to long-context reasoning and RL, we’re excited to showcase the cutting edge research for you here: pli.princeton.edu/blog/2025/prin…
🚀 Huge milestone from our Goedel-Prover team: we’ve just released a new state-of-the-art model (8B & 32B) for automated theorem proving—surpassing the previous best 671B DeepSeek model by a wide margin, all with academic compute!
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…
Check out the PokeAgent Challenge for NeurIPS 2025 and consider participating!
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
maybe I can finally make my quagsire-gastrodon mono-water dreams come true… really excited to see what approaches end up being successful!
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
Excited to release our NeurIPS 2025 PokéAgent Challenge! Pokémon becomes a testbed for long-horizon learning and stochastic game theory. Curious to see which algorithms hold up under pressure.
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…
🚀 Super excited about this @NeurIPSConf challenge! 🚨 To help with training, we open-sourced 5M+ competitive Pokémon battles!!! Can't wait to see how people use the data
🚀 Launch day! The NeurIPS 2025 PokéAgent Challenge is live. Two tracks: ① Showdown Battling – imperfect-info, turn-based strategy ② Pokemon Emerald Speedrunning – long horizon RPG planning 5 M labeled replays • starter kit • baselines. Bring your LLM, RL, or hybrid…