Sean Welleck (@wellecks)

Sean Welleck Retweeted

A

Alex Kontorovich@AlexKontorovich · Jul 23

Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...

10

52

462

122

39.0K

Sean Welleck Retweeted

K

Kyunghyun Cho@kchonyc · Jul 21

😂 @wellecks , i think this “challenging problem” may have been finally solved after five years. === Understanding and creating mathematics using natural mathematical language … used by humans is a challenging and important problem for driving progress in machine learning. ===

5

8

102

30

9.0K

S

Sean Welleck@wellecks · Jul 17

AlphaVerus – today at ICML!

PPranjal Aggarwal ✈️ ICML 2025@PranjalAggarw16 · Jul 17

Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 alphaverus.github.io w/ Bryan, @wellecks

0

1

12

5

1.0K

Sean Welleck Retweeted

P

Pranjal Aggarwal ✈️ ICML 2025@PranjalAggarw16 · Jul 17

Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 alphaverus.github.io w/ Bryan, @wellecks

0

10

58

35

5.0K

S

Sean Welleck@wellecks · Jul 16

Huge congratulations to Vaishnavh, Chen and Charles on the outstanding paper award 🎉 We will be presenting our #ICML2025 work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT. Or please stop by our poster right after @ East Exhibition…

VVaishnavh Nagarajan@_vaishnavh · Jun 2

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

0

7

44

13

4.0K

Sean Welleck Retweeted

A

Akari Asai@AkariAsai · Jul 15

Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

113

61

1.0K

107

103.0K

Sean Welleck Retweeted

P

Pranjal Aggarwal ✈️ ICML 2025@PranjalAggarw16 · Jul 14

I will be at #ICML2025 this week. Reach out if you want to chat about llm reasoning, computer-use agents, code gen or actually anything! (DMs are open) I will also be presenting AlphaVerus (self-improving verified code gen) this Thursday! alphaverus.github.io

0

1

15

2

918

S

Sean Welleck@wellecks · Jul 14

L1 is heading to COLM! We've released 5 new open L1 models and the Massive-Math dataset to celebrate:

PPranjal Aggarwal ✈️ ICML 2025@PranjalAggarw16 · Jul 13

Super excited to see L1 accepted to #COLM2025! We are further open-sourcing 5 new models & a dataset: 1. L1-7B & L1-8B: Exact and Max variants 2. L1-1.5B-Short: Short reasoning model (SRM), RL-trained on 1.2M data points 3. Massive-Math-455K: A clean, unified math dataset 🧵

0

2

10

4

5.0K

Sean Welleck Retweeted

M

Muhammad Khalifa@MKhalifaaaa · Jun 17

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 @COLM_conf is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics…

0

11

18

1

4.0K

S

Sean Welleck@wellecks · Jun 9

Really nice work based on inference scaling laws that account for memory accesses. Very insightful!

IInfini-AI-Lab@InfiniAILab · Jun 6

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)…

0

1

14

8

3.0K

Sean Welleck Retweeted

f

fly51fly@fly51fly · Jun 7

[LG] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening A He, D Fried, S Welleck [CMU] (2025) arxiv.org/abs/2506.02355

1

10

38

19

3.0K

Sean Welleck Retweeted

I

Infini-AI-Lab@InfiniAILab · Jun 6

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)…

7

69

246

162

77.0K

S

Sean Welleck@wellecks · Jun 5

In the test time scaling era, we all would love a higher throughput serving engine! Introducing Tokasaurus, a LLM inference engine for high-throughput workloads with large and small models! Led by @jordanjuravsky, in collaboration with @HazyResearch and an amazing team!

JJordan Juravsky@jordanjuravsky · Jun 5

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with @achakravarthy01, @ryansehrlich, @EyubogluSabri, @brad19brown, @jshetaye,…

2

24

140

48

17.0K

Sean Welleck Retweeted

L

Lewis Tunstall@_lewtun · Jun 5

There's lots of RL goodies in the tech report behind @FutureHouseSF's new reasoning model for chemistry 👀 Three things stood out to me: 1. Training domain-specific experts in parallel, before distilling into a generalist model. The clever thing here is that you can parallelise…

5

40

307

303

25.0K

S

Sean Welleck@wellecks · Jun 6

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

HHan Guo@HanGuo97 · Jun 6

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

1

52

571

324

46.0K

S

Sean Welleck@wellecks · Jun 5

Simple yet cool idea. I find it interesting how the community is now cares more about pass@k than pass@1 eval which dominated the field over the last 5-6 months

SSean Welleck@wellecks · Jun 4

New paper by Andre He: Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening arxiv.org/abs/2506.02355 Tired of sharpening the distribution? Try unlikeliness reward to learn new things from the roads less traveled

0

2

9

2

2.0K

S

Sean Welleck@wellecks · Jun 4

I believe the next big test for LLMs is whether they can generate truly novel ideas in open-ended situations. We translate notions of "creativity" from cogsci into simple tasks that reveal how far today’s models fall, and how multi-token training + randomness might help.

VVaishnavh Nagarajan@_vaishnavh · Jun 2

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

1

13

80

55

11.0K

Sean Welleck Retweeted

A

Akari Asai@AkariAsai · Jun 4

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to @sharma_ashish_2 and @sewon__min - huge win for @uwnlp and the broader NLP community! 🙌

6

17

178

11

13.0K

Sean Welleck Retweeted

S

Sean Welleck@wellecks · Jun 4

Unlikeliness reward dramatically changes how GRPO uplifts low probability vs. high probability sequences, leading to improved pass@N for high N. It also improves sample diversity, e.g. measured by unique proofs generated.

1

2

7

4

1.0K

Sean Welleck Retweeted

S

Sean Welleck@wellecks · Jun 4

We found that GRPO suffers from what we call a rank bias: reinforcing high probability correct outputs, but not low probability correct outputs (left plot) However, we argue that increasing low-probability correct outputs is important for improving pass@N (right plot)

1

10

1

1.0K