Parshin Shojaee (@ParshinShojaee)

Pinned

P

Parshin Shojaee@ParshinShojaee · Apr 16

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

ParshinShojaee's tweet image. Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs!

🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

4

32

199

119

36.0K

Pinned

Parshin Shojaee Retweeted

C

Chandan Reddy@chandankreddy · Jul 13

I'll be at #ICML2025 presenting our LLM-SRBench: A New Benchmark for Scientific Equation Discovery with LLMs (on July 17)! Feel free to stop by or ping me for a chat! Oral (Session 5D) at 10:15 AM and Poster(W-115) at 11 AM. openreview.net/forum?id=SyQPi…

1

4

15

2

2.0K

Pinned

P

Parshin Shojaee@ParshinShojaee · Jun 30

Why does RL struggle with long reasoning chains? Because finding the correct solution by chance is exponentially rare. Solution: break down the complexity of the problem somehow, and ease into it adaptively! We propose AdaBack: an adaptive backtracking method that conditions…

mmasani@MohammadHAmani · Jun 30

Why does RL struggle with tasks requiring long reasoning chains? Because “bumping into” a correct solution becomes exponentially less likely as the number of reasoning steps grows. We propose an adaptive backtracking algorithm: AdaBack. 1/n

0

17

155

160

15.0K

Parshin Shojaee Retweeted

1

10X AI@10X_AI_ · Jul 24

4. Question Selection > Problem Solving "It's harder to come up with a really good conjecture than it is to solve it." Takeaway: Focus on asking the right questions, not just finding answers. The person who identifies the breakthrough question often matters more than who…

14

51

605

258

173.0K

Parshin Shojaee Retweeted

K

Kenneth Stanley@kenneth0stanley · Jul 15

I worry that so much discussion of AI risks and alignment overlooks the rather large elephant in the room: creativity and open-endedness. Policy makers and gatekeepers need to understand two competing forces that no one seems to talk about: (1) there is a massive economic…

18

20

140

39

21.0K

Parshin Shojaee Retweeted

P

Prof. Feynman@ProfFeynman · Jul 12

The more you learn, the more you realize how little you know.

128

1.0K

7.0K

464

240.0K

P

Parshin Shojaee@ParshinShojaee · Jul 12

this paper is one of the most insightful reads I've had lately. The idea that we need to go beyond just perfect predictions really resonates, and it's actually what got me into researching scientific equation discovery and symbolic world models

KKeyon Vafa@keyonV · Jul 11

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

1

2

42

24

11.0K

Parshin Shojaee Retweeted

K

Keyon Vafa@keyonV · Jul 11

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

213

1.0K

7.0K

5.0K

1.3M

P

Parshin Shojaee@ParshinShojaee · Jul 11

Sadly, I can't make it to #ICML2025 for our oral presentation - visa issues again :( This is the third conference I miss for the same reason.. But my great advisor @chandankreddy will be there to present our paper LLM-SRBench. If you're interested to learn more about LLM…

PParshin Shojaee@ParshinShojaee · Apr 16

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

0

1

69

12

6.0K

Parshin Shojaee Retweeted

S

Shane Gu@shaneguML · Jul 8

Biological intelligence: - first: evolutionary algorithm - second: reinforcement learning - third: predictive learning Artificial intelligence: - first: predictive learning - second: reinforcement learning - third: no. evolutionary algorithm is just a special case of RL 🥸

8

7

104

42

9.0K

Parshin Shojaee Retweeted

K

Kenneth Stanley@kenneth0stanley · Jul 5

Some commenters who have not yet read our paper on Fractured Entangled Representation (FER) have worried in advance that it is either overly pessimistic or overclaiming its implications. Rest assured, if you actually read the paper, you will see that we went for nuance: Rather…

29

39

139

49

15.0K

Parshin Shojaee Retweeted

w

will brown@willccbb · Jun 28

you can be angry that the world is always changing, or you can be curious about the nature of change and anticipate what that means for the future

13

12

225

33

13.0K

P

Parshin Shojaee@ParshinShojaee · Jun 28

A bit late but happy to share that LLM-SRBench, our new benchmark targeting memorization issue in LLMs for scientific discovery is selected for *Oral* presentation at #ICML2025 ! Great to see the community recognizing the importance of this direction. Checkout the camera-ready…

PParshin Shojaee@ParshinShojaee · Apr 16

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

0

4

34

3

2.0K

Parshin Shojaee Retweeted

J

Jonny Cook@JonnyCoook · Jun 24

Can an LLM be programmed? In our new preprint, we show that LLMs can learn to evaluate programs for a range of inputs by being trained on the program source code alone – a phenomenon we call Programming by Backprop (PBB). 🧵⬇️

6

33

126

68

18.0K

Parshin Shojaee Retweeted

L

Laura Ruis@LauraRuis · Jun 24

LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.

4

54

316

252

33.0K

Parshin Shojaee Retweeted

N

Nouha Dziri@nouhadziri · Jun 24

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found…

23

159

725

669

164.0K

Parshin Shojaee Retweeted

P

Prof. Feynman@ProfFeynman · Jun 23

15

234

2.0K

139

85.0K

P

Parshin Shojaee@ParshinShojaee · Jun 18

This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce…

SShanda Li 黎善达@Shanda_Li_2000 · Jun 17

Can LLM solve PDEs? 🤯 We present CodePDE, a framework that uses LLMs to automatically generate solvers for PDE and outperforms human implementation! 🚀 CodePDE demonstrates the power of inference-time algorithms and scaling for PDE solving. More in 🧵: #ML4PDE #AI4Science

0

11

36

10

4.0K

Parshin Shojaee Retweeted

T

Tal Linzen@tallinzen · Jun 12

So, on the topic of the Apple puzzle reasoning paper: we got pretty similar results in our recent paper on recognizing context-free languages as an LLM eval, a task that also requires the model to follow an algorithm (which I think is what LLM folks mean by "reasoning").

7

27

303

199

43.0K

Parshin Shojaee Retweeted

P

Parshin Shojaee@ParshinShojaee · Jun 10

@BlackHC Hi Andreas, Thank you for sharing your constructive comments. Some of the points you made about ToH and context-size are valid but we believe this needs deeper discussion. Please see our detailed response below: > Comment: Because it gets worse: For N >= 12 or 13…

3

7

76

39

16.0K

Parshin Shojaee Retweeted

N

Nikhil Abhyankar@AbhyankarNikhil · Jun 6

Excited to share our new work, "LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers"! We show how LLMs enhance automated feature engineering for tabular data. And yes, feature engineering still matters!🧠 w/ @ParshinShojaee & @chandankreddy

1

4

10

4

2.0K