Zhihong Shao

@zhs05232838

Researcher @deepseek_ai | Ph.D. @TsinghuaCoAI | Ex. @MSFTResearch | Recent: DeepSeek-R1, DeepSeek-Coder-v2, DeepSeekMath, DeepSeek-Prover, Math-Shepherd, ToRA.

Beijing, China

Joined October 2019

752Following

9KFollowers

Pinned

Zhihong Shao@zhs05232838 · Jan 20

Here comes DeepSeek-R1, our latest reasoning model with significantly enhanced reasoning abilities. We also share a technical report on how we train the reasoning model with large-scale RL. Have fun!

DDeepSeek@deepseek_ai · Jan 20

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

193

3.0K

618

236.0K

Zhihong Shao@zhs05232838 · Apr 30

We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…

zhs05232838's tweet image. We just released DeepSeek-Prover V2.
- Solves nearly 90% of miniF2F problems
- Significantly improves the SoTA performance on the PutnamBench
- Achieves a non-trivial pass rate on AIME 24 &amp; 25 problems in their formal version

Github: github.com/deepseek-ai/De…

322

2.0K

630

451.0K

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Mar 1

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k…

791

1.0K

9.0K

2.0K

3.9M

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 28

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min…

531

1.0K

11.0K

3.0K

3.2M

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 27

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗…

451

843

6.0K

830

2.5M

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 26

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled…

472

1.0K

7.0K

903

942.0K

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 25

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅…

520

1.0K

8.0K

1.0K

1.4M

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 24

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS…

561

1.0K

11.0K

2.0K

1.7M

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Feb 21

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,…

1.0K

3.0K

21.0K

2.0K

2.5M

Zhihong Shao Retweeted

Mark Chen@markchen90 · Jan 28

Congrats to DeepSeek on producing an o1-level reasoning model! Their research paper demonstrates that they’ve independently found some of the core ideas that we did on our way to o1.

1.0K

2.0K

23.0K

5.0K

8.3M

Zhihong Shao Retweeted

Zizheng Pan@zizhpan · Jan 28

Again guys, this is not our Wenfeng.

379

713

11.0K

666

1.4M

Zhihong Shao Retweeted

Naman Jain@StringChaos · Jan 17

DeepSeek-R1 (Preview) Results 🔥 We worked with the @deepseek_ai team to evaluate R1 Preview models on LiveCodeBench. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance! Huge kudos to the team and I'm looking forward to the full release!! /1

645

194

174.0K

Zhihong Shao@zhs05232838 · Dec 26

Here comes the official release of DeepSeek-V3. We also share a lot in the tech report. Check it out!

DDeepSeek@deepseek_ai · Dec 26

🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n

202

26.0K

Zhihong Shao@zhs05232838 · Nov 20

Our DeepSeek reasoning model is great on code and math. Try it out!

DDeepSeek@deepseek_ai · Nov 20

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! 🌐 Try it now at chat.deepseek.com #DeepSeek

762

215

205.0K

Zhihong Shao@zhs05232838 · Sep 12

o1 — our first model trained with reinforcement learning to think hard about problems before answering. Extremely proud of the team! This is a new paradigm with vast opportunity. This is evident quantitatively (eg reasoning metrics are already a step function improved) and…

T@ ·

362

32.0K

Zhihong Shao@zhs05232838 · Sep 12

I have always believed that you don't need a GPT-6 quality base model to achieve human-level reasoning performance, and that reinforcement learning was the missing ingredient on the path to AGI. Today, we have the proof -- o1. x.com/OpenAI/status/…

OOpenAI@OpenAI · Sep 12

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…

159

2.0K

578

519.0K

Zhihong Shao@zhs05232838 · Aug 19

Super interesting work from DeepSeek on MiniF2F (so happy to see our benchmark still in use \o/). It's hard to compare this with the recent DeepMind paper but from my experience building and using MiniF2F I think ~60% pass-rate is likely comparable to DeepMind's recent result on…

DDeepSeek@deepseek_ai · Aug 16

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search > New SOTA: 63.5% on miniF2F (high school) & 25.3% on ProofNet (undergrad) > Introduces RMaxTS: Novel MCTS for diverse proof generation > Features RLPAF:…

105

24.0K

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Aug 16

101

564

189

158.0K

Zhihong Shao@zhs05232838 · Aug 8

LLMs can assist humans in providing feedback to train the next LLM. Our recent work led by @jiaxinwen22 shows that LLMs can empower non-experts to match expert programmers in fixing LLM-generated code on the competitive programming task.

JJiaxin Wen@jiaxinwen22 · Aug 7

LLMs can generate complex programs. But they are often wrong. How should users fix them? We propose to use LLMs to assist humans by decomposing the solutions in a helpful way. We increase non-experts' efficiency by 3.3X, allow them to solve 33.3% more problems, and empower them…

2.0K

Zhihong Shao Retweeted

DeepSeek@deepseek_ai · Jul 10, 2024

📢 After 3 months, the AI Mathematical Olympiad (AIMO) on Kaggle has announced the winners! 🎉 We're thrilled to see the Top 4 teams all chose DeepSeekMath-7B as their base model, with Numina @JiaLi52524397 achieving 29/50 correct answers! 👏 Even Terence Tao was amazed. 🤯…

100

648

159

96.0K