Parshin Shojaee
@ParshinShojaee
PhD student @VT_CS | AI for Science, Math, Code, Reasoning | Intern @Apple | prev @Adobe
Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

I'll be at #ICML2025 presenting our LLM-SRBench: A New Benchmark for Scientific Equation Discovery with LLMs (on July 17)! Feel free to stop by or ping me for a chat! Oral (Session 5D) at 10:15 AM and Poster(W-115) at 11 AM. openreview.net/forum?id=SyQPi…
Why does RL struggle with long reasoning chains? Because finding the correct solution by chance is exponentially rare. Solution: break down the complexity of the problem somehow, and ease into it adaptively! We propose AdaBack: an adaptive backtracking method that conditions…
Why does RL struggle with tasks requiring long reasoning chains? Because “bumping into” a correct solution becomes exponentially less likely as the number of reasoning steps grows. We propose an adaptive backtracking algorithm: AdaBack. 1/n
4. Question Selection > Problem Solving "It's harder to come up with a really good conjecture than it is to solve it." Takeaway: Focus on asking the right questions, not just finding answers. The person who identifies the breakthrough question often matters more than who…
I worry that so much discussion of AI risks and alignment overlooks the rather large elephant in the room: creativity and open-endedness. Policy makers and gatekeepers need to understand two competing forces that no one seems to talk about: (1) there is a massive economic…
The more you learn, the more you realize how little you know.
this paper is one of the most insightful reads I've had lately. The idea that we need to go beyond just perfect predictions really resonates, and it's actually what got me into researching scientific equation discovery and symbolic world models
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
Sadly, I can't make it to #ICML2025 for our oral presentation - visa issues again :( This is the third conference I miss for the same reason.. But my great advisor @chandankreddy will be there to present our paper LLM-SRBench. If you're interested to learn more about LLM…
Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…
Biological intelligence: - first: evolutionary algorithm - second: reinforcement learning - third: predictive learning Artificial intelligence: - first: predictive learning - second: reinforcement learning - third: no. evolutionary algorithm is just a special case of RL 🥸
Some commenters who have not yet read our paper on Fractured Entangled Representation (FER) have worried in advance that it is either overly pessimistic or overclaiming its implications. Rest assured, if you actually read the paper, you will see that we went for nuance: Rather…
you can be angry that the world is always changing, or you can be curious about the nature of change and anticipate what that means for the future
A bit late but happy to share that LLM-SRBench, our new benchmark targeting memorization issue in LLMs for scientific discovery is selected for *Oral* presentation at #ICML2025 ! Great to see the community recognizing the importance of this direction. Checkout the camera-ready…
Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…
Can an LLM be programmed? In our new preprint, we show that LLMs can learn to evaluate programs for a range of inputs by being trained on the program source code alone – a phenomenon we call Programming by Backprop (PBB). 🧵⬇️
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found…
This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce…
Can LLM solve PDEs? 🤯 We present CodePDE, a framework that uses LLMs to automatically generate solvers for PDE and outperforms human implementation! 🚀 CodePDE demonstrates the power of inference-time algorithms and scaling for PDE solving. More in 🧵: #ML4PDE #AI4Science
So, on the topic of the Apple puzzle reasoning paper: we got pretty similar results in our recent paper on recognizing context-free languages as an LLM eval, a task that also requires the model to follow an algorithm (which I think is what LLM folks mean by "reasoning").
@BlackHC Hi Andreas, Thank you for sharing your constructive comments. Some of the points you made about ToH and context-size are valid but we believe this needs deeper discussion. Please see our detailed response below: > Comment: Because it gets worse: For N >= 12 or 13…
Excited to share our new work, "LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers"! We show how LLMs enhance automated feature engineering for tabular data. And yes, feature engineering still matters!🧠 w/ @ParshinShojaee & @chandankreddy