Zizheng Pan

@zizhpan

Researcher @deepseek_ai | Previously @nvidia @MonashUni @UniofAdelaide. Words are my own.

Joined February 2022

840Following

61KFollowers

Zizheng Pan Retweeted

Xingyi Yang@yxy2168 · Jul 2

Life Update: Thrilled to join @HongKongPolyU as Tenure-Track Assistant Professor in Data Science & AI! 🔍 Now hiring PhDs in the hottest topics of AI: 🔥 Generative AI👁️Computer Vision🤖 Agentic AI Just Apply→adamdad.github.io/opening #AIResearch #AI #GenAI #PhD #PolyU #HongKong

4.0K

Zizheng Pan@zizhpan · Jun 25

If you’re interested in AI/ML and vision research, don’t miss this great opportunity to work with Chuanxia — an amazing mentor and researcher who’s looking for new students!

CChuanxia Zheng@ChuanxiaZ · Jun 24

After two amazing years with @Oxford_VGG, I will be joining @NTUsg as a Nanyang Assistant Professor in Fall 2025! I’ll be leading the Physical Vision Group (physicalvision.github.io) — and we're hiring for next year!🚀 If you're passionate about vision or AI, get in touch!

8.0K

Zizheng Pan@zizhpan · Jun 16

🚨Breaking: New DeepSeek-r1 (0528) just tied for #1 in WebDev Arena, matching Claude Opus 4! More highlights: 💠 #6 Overall on Text Arena 💠 #2 in Coding, #4 in Hard Prompts, #5 in Math category 💠 MIT-licensed, currently the best open model on the leaderboard! Huge congrats…

DDeepSeek@deepseek_ai · May 29

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗…

111

878

207

162.0K

Zizheng Pan Retweeted

Weijie Wang@wjwang2003 · May 30

🚀 We're excited to introduce ZPressor, a bottleneck-aware compression module for scalable feed-forward 3DGS. Existing feed-forward 3DGS models struggle with dense views, facing performance drops & massive redundancy. ZPressor leverages Information Bottleneck Theory to compress…

8.0K

Zizheng Pan Retweeted

Haihao Shen@HaihaoShen · May 29

🔥DeepSeek-R1-0528-Qwen3-8B INT4 model with AutoRound, AWQ, GPTQ, and GGUF formats (quantized by Intel AutoRound & Neural Compressor) are available at HF @intel space. Run with vLLM, SGLang and Transformers😍 huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek… huggingface.co/Intel/DeepSeek…

139

29.0K

Zizheng Pan@zizhpan · May 29

R1-0528 is out!🎉

DDeepSeek@deepseek_ai · May 29

104

1.0K

165

192.0K

Zizheng Pan Retweeted

Zhihong Shao@zhs05232838 · Apr 30

We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems - Significantly improves the SoTA performance on the PutnamBench - Achieves a non-trivial pass rate on AIME 24 & 25 problems in their formal version Github: github.com/deepseek-ai/De…

322

2.0K

630

451.0K

Zizheng Pan@zizhpan · Mar 25

Guess you probably already knew that we have an update for V3!

DDeepSeek@deepseek_ai · Mar 25

🚀 DeepSeek-V3-0324 is out now! 🔹 Major boost in reasoning performance 🔹 Stronger front-end development skills 🔹 Smarter tool-use capabilities ✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink” 🔌 API usage remains unchanged 📜 Models are…

1.0K

74.0K

Zizheng Pan Retweeted

Artificial Analysis@ArtificialAnlys · Mar 25

DeepSeek takes the lead: DeepSeek V3-0324 is now the highest scoring non-reasoning model This is the first time an open weights model is the leading non-reasoning model, a milestone for open source. DeepSeek V3-0324 has jumped forward 7 points in Artificial Analysis…

651

4.0K

840

484.0K

Zizheng Pan Retweeted

Yefei He@yefei_he · Mar 17

Introducing NAR, our latest breakthrough in visual generation! 🎨 NAR adopts a "next neighbor prediction" mechanism, transforming visual generation into a step-by-step "outpainting" process. 📄 Paper: arxiv.org/abs/2503.10696 🌍 Project Page: yuanyu0.github.io/nar

5.0K

Zizheng Pan@zizhpan · Mar 1

One More Thing! Happy weekend!

DDeepSeek@deepseek_ai · Mar 1

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k…

887

84.0K

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 28

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min…

531

1.0K

11.0K

3.0K

3.2M

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 27

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗…

451

845

6.0K

830

2.5M

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 26

🚨 Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily: 🔹 DeepSeek-V3 at 50% off 🔹 DeepSeek-R1 at a massive 75% off Maximize your resources smarter — save more during these high-value hours!

544

712

7.0K

919

865.0K

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 26

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled…

472

1.0K

7.0K

903

942.0K

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 25

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅…

520

1.0K

8.0K

1.0K

1.4M

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 24

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS…

561

1.0K

11.0K

2.0K

1.7M

Zizheng Pan@zizhpan · Feb 21

Hope everyone can enjoy the benefits of open science. Join us in celebrating #OpenSourceWeek! 🔥

DDeepSeek@deepseek_ai · Feb 21

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,…

834

79.0K

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 18

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With…

898

2.0K

16.0K

5.0K

2.5M

Zizheng Pan Retweeted

DeepSeek@deepseek_ai · Feb 14

🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: • No system prompt • Temperature: 0.6 • Official prompts for search & file upload: bit.ly/4hyH8np • Guidelines to mitigate model bypass…

701

2.0K

16.0K

7.0K

1.8M