Siyu Yuan

@siyu_yuan_

Ph.D. candidate at Fudan University. Ex-Research Intern at @MSFTResearch Asia and @BytedanceTalk AI Lab

Shanghai, China

Joined February 2018

515Following

817Followers

Pinned

Siyu Yuan@siyu_yuan_ · May 27

🎉 Introducing our latest work — Enigmata: A Full-Stack Recipe for Advancing Logical Reasoning in LLMs! Enigmata offers a complete pipeline from data generation → verification → RLVR training → evaluation, designed to systematically enhance the puzzle reasoning skills of LLMs.

siyu_yuan_'s tweet image. 🎉 Introducing our latest work — Enigmata: A Full-Stack Recipe for Advancing Logical Reasoning in LLMs!
Enigmata offers a complete pipeline from data generation → verification → RLVR training → evaluation, designed to systematically enhance the puzzle reasoning skills of LLMs.

269

219

35.0K

Siyu Yuan Retweeted

yzhclear@yzhclear · Jul 3

🖥️ 2025已经过去一半了，盘点一下我这半年用得非常顺手的Mac工具： Cursor: 这个不必多说，AI时代程序员必备IDE，已经很长时间没有打开VSCode和PyCharm了。 Better Display: 27寸副屏默认显示太糊了，打开Better Display的高分辨率(HiDPI)选项，瞬间清晰了。效果堪比近视眼戴上眼镜。 Orbstack:…

317

1.0K

2.0K

196.0K

Siyu Yuan@siyu_yuan_ · Jun 25

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window…

ttobi lutke@tobi · Jun 19

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

533

2.0K

14.0K

9.0K

2.3M

Siyu Yuan Retweeted

Jyo Pari@jyo_pari · Jun 13

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

131

521

3.0K

591.0K

Siyu Yuan Retweeted

Xintao Wang@xintao_w · May 6

📢 Introducing CoSER: Advancing AI Character Role-Playing with High-Quality Data from Best Books CoSER is a collection of a high-quality dataset, open models, and novel evaluation protocol for more authentic AI character role-playing! 📄Paper: arxiv.org/pdf/2502.09082 (1/3)

2.0K

Siyu Yuan Retweeted

The AI Timeline@TheAITimeline · Apr 27

🚨This week's top AI/ML research papers: - Test-Time RL - PHYBench - Process Reward Models That Think - Tiny Reasoning Models via LoRA - Learning to Reason under Off-Policy Guidance - SplitReason - Learning Adaptive Parallel Reasoning with LMs - Token-Shuffle - Describe Anything…

590

475

49.0K

Siyu Yuan@siyu_yuan_ · Apr 21

Transformer vs. Mixture of Experts in LLMs, clearly explained:

AAkshay 🚀@akshay_pachaar · Apr 21

Transformer vs. Mixture of Experts in LLMs, clearly explained (with visuals):

262

1.0K

944

107.0K

Siyu Yuan Retweeted

alphaXiv@askalphaxiv · Apr 8

Introducing Deep Research for arXiv Ask questions like 'What are the latest breakthroughs in RL fine-tuning?' and get comprehensive lit reviews with trending papers automatically included Turn hours of literature searches into seconds with AI-powered research context ⚡

559

3.0K

2.0K

367.0K

Siyu Yuan Retweeted

Rohan Paul@rohanpaul_ai · Apr 12

Current LLM judges, fine-tuned using Supervised Fine-Tuning (SFT), perform poorly on evaluation tasks requiring deep reasoning. This paper introduces JudgeLRM, models trained using Reinforcement Learning (RL) with judge-specific rewards, enhancing reasoning for evaluation tasks;…

154

116

11.0K

Siyu Yuan Retweeted

Grant Slatton@GrantSlatton · Mar 25

tremendous alpha right now in sending your wife photos of yall converted to studio ghibli anime

2.0K

45.0K

16.0K

52.8M

Siyu Yuan Retweeted

Eric Zhao@ericzhao28 · Mar 17

Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new @Google_AI paper, we instead focus on scaling the search axis. By just randomly sampling 200x & self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!

266

2.0K

353.0K

Siyu Yuan Retweeted

Yuxiao Qu@QuYuxiao · Mar 11

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

308

223

43.0K

Siyu Yuan Retweeted

�

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Mar 3

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

435

2.0K

190.0K

Siyu Yuan Retweeted

Alex Prompter@alex_prompter · Feb 18

I tested Grok 3 and DeepSeek v3 with same critical prompts. The results will blow your mind. Grok 3 Vs. DeepSeek v3 (Video demos are included)

162

549

4.0K

1.1M

Siyu Yuan Retweeted

Wanjia Zhao@WanjiaZhao1203 · Feb 11

Introducing #SIRIUS🌟: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. 📜Preprint: arxiv.org/pdf/2502.04780 💻code: github.com/zou-group/siri… 1/N

326

187

35.0K

Siyu Yuan@siyu_yuan_ · Jan 22

Thanks AK for sharing! 🤔How to endow language agents with self-correction capabilities for interactive environments?——📷Introducing Agent-R🔍, a novel framework designed to enable LLM-based agents to perform on-the-fly reflection and self-improvement 🎉.

AAK@_akhaliq · Jan 22

Agent-R Training Language Model Agents to Reflect via Iterative Self-Training

510