Weiwei Sun
@sunweiwei12
PhD student @LTIatCMU | Ex @VectorInst @Baidu_Inc @ShandongU @ic4ai | Working on NLP, LLM, IR
Most AI agents are tested in a bubble. But real ML breakthroughs happen in communities. We introduce CoMind, an research agent that learns from community knowledge. 📊 CoMind outperforms ~70% human teams in a CVPR 2025 workshop competition. 🧵👇
This work is done with amazing @__tm__157, @JunhongShen1, @sunweiwei12, @risteski_a, Yiming Yang, and @atalwalkar. We are excited about applying LLM techniques to solve more challenging PDEs in the future! 📝Paper: arxiv.org/abs/2505.08783 ⭐Code: github.com/LithiumDA/Code…
Can LLM solve PDEs? 🤯 We present CodePDE, a framework that uses LLMs to automatically generate solvers for PDE and outperforms human implementation! 🚀 CodePDE demonstrates the power of inference-time algorithms and scaling for PDE solving. More in 🧵: #ML4PDE #AI4Science
Thrilled to introduce FrontierCO — a benchmark of real, challenging (some unsolved) combinatorial optimization problems. It’s built to push frontier AI beyond toy tasks and toward advancing the boundaries of human problem-solving! Paper: arxiv.org/abs/2505.16952
🌟Get rid of the evaluation on synthetic toy problems and advance human intelligence like #AlphaEvolve! 🚀 Introducing FrontierCO — our new Machine Learning for Combinatorial Optimization benchmark featuring high-quality NP-hard instances from real-world applications and…
🌟Get rid of the evaluation on synthetic toy problems and advance human intelligence like #AlphaEvolve! 🚀 Introducing FrontierCO — our new Machine Learning for Combinatorial Optimization benchmark featuring high-quality NP-hard instances from real-world applications and…
We invite you to explore our latest work on RAG. We conceptualize RAG as a multi-agent collaboration task, aiming to align and unify the optimization objectives of its various modules with the ultimate goal of generating high-quality answers.
Improving RAG through Multi-Agent RL This work treats RAG as a multi-agent cooperative task to improve answer generation quality. It models RAG components like query rewriting, document selection, and answer generation as reinforcement learning agents working together toward…
This new approach for RAG modeling and optimization, MMOA-RAG, treats RAG as a multi-agent collaboration task. It uses MARL to simultaneously optimize multiple modules, aligning their objectives with the final goal of generating high-quality responses.
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning Optimizes multiple RAG components jointly through multi-agent RL to align their goals toward generating high-quality answers. 📝 arxiv.org/abs/2501.15228 👨🏽💻 github.com/chenyiqun/MMOA…
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning Optimizes multiple RAG components jointly through multi-agent RL to align their goals toward generating high-quality answers. 📝 arxiv.org/abs/2501.15228 👨🏽💻 github.com/chenyiqun/MMOA…
Improving RAG through Multi-Agent RL This work treats RAG as a multi-agent cooperative task to improve answer generation quality. It models RAG components like query rewriting, document selection, and answer generation as reinforcement learning agents working together toward…
I will be presenting our paper “MAIR: A Massive Benchmark for Evaluating Instructed Retrieval” at #EMNLP2024! Date: Tuesday, Nov 12 Time: 14:00-15:30 Session 03: Resources and Evaluation 1 Paper: arxiv.org/abs/2410.10127 See you there!
💡Check MAIR at #EMNLP2024 A large-scale IR benchmark! Highlights: - Task Diversity: 126 realistic tasks, 8x than BEIR 📈 - Domain Coverage: 6 domains and heterogeneous sources 📚 - Instruction Following: 805 relevance criterions - Lightweight & Fast: optimized data sampling ⚡️
The ability of an IR model to tackle various types of queries is THE MOST important problem when I was working on real search engine prodcution. Glad to complete this work with Weiwei and other co-authors that concludes many new insights to this problem. Check out our paper:
💡Check MAIR at #EMNLP2024 A large-scale IR benchmark! Highlights: - Task Diversity: 126 realistic tasks, 8x than BEIR 📈 - Domain Coverage: 6 domains and heterogeneous sources 📚 - Instruction Following: 805 relevance criterions - Lightweight & Fast: optimized data sampling ⚡️