Sophia Simeng Han
@HanSineng
CS PhD Candidate @Yale. intern @AIatMeta, prev intern @GoogleDeepMind @AWS. I enjoy thinking about thinking. On the job market!
MIT Technology Review China covered our work! 🧠 wap.mittrchina.com/news/detail/14…
Zero fluff, maximum insight ✨. Let’s see what LLMs are really made of, with 🧠 Brainteasers. We’re not grading answers 🔢. We’re grading thinking 💭. Brute force? Creative leap? False confession? 🤔 Instead of asking “Did the model get the right answer?”, we ask: “Did it…
Missing @aclmeeting but sending “ATEB: Rethinking Advanced NLP Tasks in an Information Retrieval Setting” in my place ! Come check it out at the Knowledgeable Foundation Models Workshop! Excited that our work is already influencing how embedding models are evaluated on…

Join us at the 5th MATH-AI Workshop at NeurIPS'25 ☀️🏖️🌊! 🔹 mathai2025.github.io 🔹 Featuring a stellar lineup of speakers #MATHAI2025 #NeurIPS2025 #Reasoning #LLMs
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts 🧵
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Honored to make the list 😆
So much research is being done about LLMs that it's hard to stay on top of the literature. To help with this, I've made a list of all the most important papers from the past 8 years: rtmccoy.com/pubs/ I hope you enjoy!
So much research is being done about LLMs that it's hard to stay on top of the literature. To help with this, I've made a list of all the most important papers from the past 8 years: rtmccoy.com/pubs/ I hope you enjoy!
Excited to see more investigation into LLM creativity. We have some pioneering work on this topic as well: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models. arxiv.org/pdf/2505.10844.
🚨 New study on LLM's reasoning boundary! Can LLMs really think out of the box? We introduce OMEGA—a benchmark probing how they generalize: 🔹 RL boosts accuracy on slightly harder problems with familiar strategies, 🔹 but struggles with creative leaps & strategy composition. 👇
Zero fluff, maximum insight ✨. Let’s see what LLMs are really made of, with 🧠 Brainteasers. We’re not grading answers 🔢. We’re grading thinking 💭. Brute force? Creative leap? False confession? 🤔 Instead of asking “Did the model get the right answer?”, we ask: “Did it…

Excited to be joining Meta NYC this summer as a Research Scientist Intern! If you’re also in NYC or at Meta and working on reasoning or related topics, I’d love to connect - DM me!
Besides natural language and formal language, truth table is also a great media for logical reasoning with a synergistic effect. Check out this cool idea from @LichangChen2!
Learn to Reason via Mixture-of-Thought Interesting paper to improve LLM reasoning utilizing multiple reasoning modalities: - code - natural language - symbolic (truth-table) representations Cool idea and nice results. My notes below: