Kexun Zhang

@kexun_zhang

PhD student at @LTIatCMU. Previously at @ucsbNLP, @ZJU_china. language lover.

Joined December 2021

786Following

1KFollowers

Pinned

Kexun Zhang@kexun_zhang · Jun 12

RLVR is not just about RL, it's more about VR! Particularly for LLM coding, good verifiers (tests) are hard to get! In our latest work, we ask 3 questions: How good are current tests? How do we get better tests? How much does test quality matter? leililab.github.io/HardTests/

kexun_zhang's tweet card. We propose HardTestGen, a pipeline for synthesizing high-quality test cases and study how much it improves code evaluation and LLM post-training.

6.0K

Pinned

Kexun Zhang Retweeted

Yoshua Bengio@Yoshua_Bengio · Jul 24

This article from @TheEconomist offers an accurate overview of key dynamics shaping the development of AI today: the risks of the rapid race toward AGI and ASI, the challenges posed by open-sourcing frontier models, the deep uncertainty revealed by ongoing scientific debates and…

237

16.0K

Kexun Zhang@kexun_zhang · 16 h

With the amount of money going into AI, it's shocking to me how much neurips/openreview needs to limit the rebuttal, e.g. not allowing images, to save their infra. A tiny bit of the AI money could have made everyone's life better. That says something about academic conferences.

591

Kexun Zhang Retweeted

Zhenwen Liang@LiangZhenwen · Jul 22

🚀 We’re thrilled to share our major advance in formal proving: Tencent-IMO tencent-imo.github.io Although AI like DeepMind's & OpenAI's achieving gold-level performance, these proofs still need human verification. What if AI could generate proofs that are 100% verifiable?

1.0K

Kexun Zhang@kexun_zhang · Jul 22

After three intense months of hard work with the team, we made it! We hope this release can help drive the progress of Coding Agents. Looking forward to seeing Qwen3-Coder continue creating new possibilities across the digital world!

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

939

108

61.0K

Kexun Zhang Retweeted

Dimitris Papailiopoulos@DimitrisPapail · Jul 19

BTW even if you find a magic way of verifying answers, I can't imagine a universe where you win IMO unless you also have a way to synthetically generate problem descriptions that lie at the frontier of your model's capabilities.

8.0K

Kexun Zhang Retweeted

Bolei Zhou@zhoubolei · Jul 17

This is great! But will you also consider setting up an official satellite location in China, given the fact that so many great NeurIPS papers come from China and so many Chinese researchers couldn't attend the conference due to the US/Canada Visa issue?

7.0K

Kexun Zhang@kexun_zhang · Jul 16

Okay. So the proposed safe AGI are like the “Trisolarans” from the Three-Body problem and human being’s last stand is our ability to plot and scheme? 🤣 (Sorry for the spoilers to those who haven’t read the series yet… )

MMikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

2.0K

Kexun Zhang@kexun_zhang · Jul 17

Data, code all released at github.com/LeiLiLab/HardT…

KKexun Zhang@kexun_zhang · Jun 12

585

Kexun Zhang Retweeted

Jack D. Carson@mtlushan · Jul 16

1.0K

291

102.0K

Kexun Zhang@kexun_zhang · Jul 14

Any one serious about doing AI for science should read Zhenqiao’s work

ZZhenqiao Song@ZhenqiaoSong · Jul 14

Sad to miss #ICML2025 due to visa issue this year, but it's a great time to share our new paper PPDiff, a diffusion model for protein-protein complex sequence-structure co-design with my great collaborators @lileics @leetx1010 Martin Renqiang Min.

355

Kexun Zhang Retweeted

Zhenqiao Song@ZhenqiaoSong · Jul 14

1.0K

Kexun Zhang Retweeted

Xuandong Zhao@xuandongzhao · Jul 14

🚀 Heading to #ICML2025! I'll be attending July 14-20 and would love to discuss exciting research in reasoning, RL, agents, and AI safety. I'll also be on the job market next cycle—happy to discuss opportunities! DM me to schedule a meeting in person

3.0K

Kexun Zhang Retweeted

Lindia Tjuatja @ ACL 2025@lltjuatja · Jul 7

committed to doing my part in decreasing reviewer workload by writing fewer papers

240

15.0K

Kexun Zhang@kexun_zhang · Jul 7

Giving a talk at @jetbrains with @anton_iades on inference time scaling for SWE agents! Am I allowed to mention the word “cursor” here?

kexun_zhang's tweet image. Giving a talk at @jetbrains with @anton_iades on inference time scaling for SWE agents! Am I allowed to mention the word “cursor” here?

2.0K

Kexun Zhang Retweeted

Viv@Vtrivedy10 · Jul 2

lmaaooo, need more of this in ML papers. I loved the writing style of yolov3, intro was hilarious also the graph literally just goes off the y axis and they just leave it to flex 😂

3.0K

Kexun Zhang Retweeted

Hieu Pham@hyhieu226 · Jul 1

Physicists and mathematicians really have the wriest humor.

206

2.0K

563

128.0K

Kexun Zhang Retweeted

Alon Albalak@AlbalakAlon · Jun 26

🚨 We’re hiring on the Open-Endedness team @LilaSciences and I’m beyond excited about our work! We research AI that doesn’t just solve problems, it creatively explores new scientific frontiers. If that excites you or someone you know 📢 Please RT + read on 🧵👇

12.0K

Kexun Zhang@kexun_zhang · Jun 25

Too many think the problem with LLMs is that they’re not human enough. But the problem with LLMs is that they’re not computer enough. We’re used to a standard of reliability from computer programs that LLMs so far don’t live up to. But making them human-like doesn’t fix that!

OOmar Khattab@lateinteraction · Jun 25

This feels like a great, dspy-pilled bet from Mira's Thinking Machines Labs—to focus on building and optimizing downstream LLM systems. BUT simultaneously also a very un-AGI-pilled viewpoint. If AGI is coming in months, why raise billions to build business-aligned LLMs. The…

114

10.0K