Zayne Sprague

@ZayneSprague

Ph.D. student at the University of Texas in Austin. My interest is in NLP, RL and CogSci research focusing on reasoning in AI models. (he/him)

Austin Tx

Joined November 2022

166Following

367Followers

Pinned

Zayne Sprague Retweeted

Ramya Namuduri@ramya_namuduri · Apr 21

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

8.0K

Pinned

Zayne Sprague@ZayneSprague · Feb 25

🌟Job ad🌟 We (@gregd_nlp, @mattlease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

CCosmicAI@CosmicAI_Inst · Feb 25

Seeking candidates for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here cosmicai.org/jobs/postdocge…

20.0K

Zayne Sprague Retweeted

Sedrick Keh@sedrickkeh2 · Jul 18

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

112

11.0K

Zayne Sprague Retweeted

Ryan Marten@ryanmart3n · Jun 5

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data…

191

922

726

189.0K

Zayne Sprague Retweeted

Fangcong Yin@fangcong_y10593 · Jun 2

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

11.0K

Zayne Sprague Retweeted

Amitayush Thakur@AmitayushThakur · May 25

1/🧵Excited to share CLEVER — a new benchmark for end-to-end verified code generation in Lean. Can we go from natural language to a formally verified Lean program? CLEVER puts this to the test. 📄 arxiv.org/abs/2505.13938 💻 github.com/trishullab/cle…

15.0K

Zayne Sprague Retweeted

Liyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

11.0K

Zayne Sprague Retweeted

Niloofar (✈️ ACL)@niloofar_mire · May 6

📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!

223

1.0K

75.0K

Zayne Sprague Retweeted

Elias Stengel-Eskin@EliasEskin · May 5

Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD…

448

48.0K

Zayne Sprague Retweeted

Ximing Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

252

175

70.0K

Zayne Sprague@ZayneSprague · Apr 25

I’ll be presenting our work, To CoT or not to CoT? Chain-of-Thought Helps Mainly on Math and Symbolic Reasoning at ICLR as a poster today at 3pm #292. Stop by and chat about reasoning, CoT, LongCoTs, math etc!! See you there 🤘

ZZayne Sprague@ZayneSprague · Sep 19

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

4.0K

Zayne Sprague Retweeted

Melanie Sclar@melaniesclar · Apr 24

Excited to be at #ICLR2025 🤩 I'll be giving an oral presentation for Creativity Index on Fri 25th 11:06, Garnet 212&219 🎙️ I'll also be presenting posters: 📍ExploreToM, Sat 26th 10:00, Hall 3 + 2B #49 📍CreativityIndex, Fri 25th 10:30, Hall 3 + 2B #618 Hope to see you there!

3.0K

Zayne Sprague@ZayneSprague · Apr 23

Check out my work at @bespokelabsai We release Bespoke-MiniChart-7B, a new SOTA in chart understanding of its size Chart understanding is really fun and challenging and requires reasoning skills beyond math reasoning It's a great starting point for open chart model development!

BBespoke Labs@bespokelabsai · Apr 23

Announcing Bespoke-MiniChart-7B, a new SOTA in chart understanding for models of comparable size on seven benchmarks, on par with Gemini-1.5-Pro and Claude-3.5! 🚀 Beyond its real-world applications, chart understanding is a good challenging problem for VLMs, since it requires…

2.0K

Zayne Sprague Retweeted

Nathaniel Weir@Nathaniel_Weir · Apr 23

I will be at #ICLR2025 to present the final project of my PhD (🥲): Chain-of-thought prompting elicits an LLM's knowledge for answering a single question. What about a whole ~set~ of questions? We explore ways to build an LLM's discrete microtheory about a topic of questions.

3.0K

Zayne Sprague Retweeted

Manya Wadhwa@ManyaWadhwa1 · Apr 22

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

126

18.0K

Zayne Sprague Retweeted

Etash Guha@etash_guha · Apr 3

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

137

463

333

86.0K

Zayne Sprague Retweeted

Fangyuan Xu@brunchavecmoi · Mar 5

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

111

13.0K

Zayne Sprague@ZayneSprague · Mar 12

PutnamBench: A math benchmark where no reasoning model can solve even a single problem! We evaluated leading LRMs on the Lean 4 version🧵

GGeorge Tsoukalas@gtsoukal · Jul 17, 2024

Announcing PutnamBench: an evaluation benchmark for formal mathematical reasoning in Lean 4, Isabelle, and Coq! PutnamBench consists of problems from the William-Lowell Putnam Mathematical Competition, the premier collegiate mathematics exam in the US & Canada. 🧵

10.0K

Zayne Sprague Retweeted

Alex Dimakis@AlexGDimakis · Feb 25

Pretty happy that our OpenThinker-32B is in no4 position in the General Reasoning Leaderboard. It should also be pointed out which models are open data (post-training data): OpenThinker, LIMO, OpenHermes and DeepScaler.

123

10.0K

Zayne Sprague Retweeted

Negin Raoof@NeginRaoof_ · Feb 12

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including…

127

770

522

215.0K