Shanda Li 黎善达
@Shanda_Li_2000
PhD student @mldcmu
Can LLM solve PDEs? 🤯 We present CodePDE, a framework that uses LLMs to automatically generate solvers for PDE and outperforms human implementation! 🚀 CodePDE demonstrates the power of inference-time algorithms and scaling for PDE solving. More in 🧵: #ML4PDE #AI4Science


In our new preprint, we demonstrate, for the first time, the test‑time inference scaling behavior (with faster convergence rate) of neural PDE solvers. The core idea is to derive a new PDE that characterizes the error of the neural PDE solver. 2prime.github.io/files/scasml_t…
How can we do inference time scaling for scientific machine learning? Our new inference time scaling framework leads to a new framework for high dim PDE solving and a new eigenvalue solver Join us @ #SIAM CSE next Monday 9:45 AM - 11:25 AM Room: 114
🚀 We are happy to organize the BERT²S workshop @NeurIPSConf 2025 on Recent Advances in Time Series Foundation Models. 🌐 berts-workshop.github.io 📜Submit by August 22 🎓Speakers and panelists: @ChenghaoLiu15 Mingsheng Long @zoe_piran @danielle_maddix @atalwalkar @qingsongedu
Stop by the poster sessions today at ICML Workshop on Computer Use Agents to chat about OpenHands-Versa!
Can we design AI Agents that achieve generalizability across diverse task domains? Our new paper introduces OpenHands-Versa, a generalist agent with strong performance on three challenging agent benchmarks, ranking #1 on SWE-Bench Multimodal and The Agent Company leaderboards 🚀
In recent work arxiv.org/abs/2502.12123 w/ E. Botta, @_Yuchen_Li_ , A. Mehta, @jordan_t_ash, @_cyrilzhang, we explore some algorithmic aspects of constrained generation w/ a generator & process verifier. Paper at #ICML2025, poster session (today) details in screenshot, 🧵below.
Check out our new recent work on research agent led by @PlanarG1 and @sunweiwei12!
Most AI agents are tested in a bubble. But real ML breakthroughs happen in communities. We introduce CoMind, an research agent that learns from community knowledge. 📊 CoMind outperforms ~70% human teams in a CVPR 2025 workshop competition. 🧵👇
🔥Unlocking New Paradigm for Test-Time Scaling of Agents! We introduce Test-Time Interaction (TTI), which scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976
Multi-turn agents need to really interact with the environment to get new context and information. This is the key difference from single-turn QA settings.
Lot of work in agents these days is using reasoning RL to now train agents. But is that good enough? @jackbai_jkb & @JunhongShen1 show that its not: we also want RL to learn *how* to explore and *discover* novel behaviors, by scaling "in-context" interaction!…
🌟Get rid of the evaluation on synthetic toy problems and advance human intelligence like #AlphaEvolve! 🚀 Introducing FrontierCO — our new Machine Learning for Combinatorial Optimization benchmark featuring high-quality NP-hard instances from real-world applications and…
I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵
@khodakmoments, @__tm__157, along with myself, @nmboffi and Jianfeng Lu are organizing a COLT 2025 workshop on the Theory of AI for Scientific Computing, to be held on the first day of the conference (June 30).
Cooool!
Back in March, I wore a head-mounted camera for a week straight and fine-tuned ChatGPT on the resulting data. Here's what happened (1/6) arxiv.org/pdf/2504.03857
blog.ml.cmu.edu/2025/04/09/cop… How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by @iamwaynechi and @valeriechen_ created @CopilotArena to collect user preferences on in-the-wild workflows. This blogpost overviews the design and…
We built @CopilotArena this fall as part of @lmarena_ai in order to evaluate coding models in realistic, interactive environments. Check out our recent writeup describing the results, as well as details of the system itself. Work led by @iamwaynechi and @valeriechen_.
What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched @CopilotArena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵
We introduce Mixture-of-Mamba, a multi-modal SSM that leverages modality-aware sparsity for efficient multi-modal pretraining! At the core of Mixture-of-Mamba: 🔹Modality-aware sparsity to optimize efficiency 🔹Mixture-of-SSMs to enable cross-modal interactions 🔹Scales…
🚀 Want 2x faster pretraining for your multi-modal LLM? 🧵 Following up on Mixture-of-Transformers (MoT), we're excited to share Mixture-of-Mamba (MoM)! arxiv.org/abs/2501.16295 🔥 Why it matters: MoM applies modality-aware sparsity across image, text, and speech—making…