Zaid Khan

@codezakh

@uncnlp with @mohitban47 working on grounded reasoning + multimodal agents // currently @allen_ai formerly @neclabsamerica // bs+ms CompE @northeastern

Boston, USA

Joined June 2023

803Following

533Followers

Pinned

Zaid Khan@codezakh · Apr 15

What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants? Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

codezakh's tweet image. What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants?

Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

116

19.0K

Zaid Khan@codezakh · Jul 18

The MUGen workshop at #ICML2025 is happening now! Stop by for talks on adversarial ML, unlearning as rational belief revision, failure modes in unlearning, robust LLM unlearning, and the bright vs. dark side of forgetting in generative AI!

VVaidehi Patil@vaidehi_patil_ · Apr 2

🚨Exciting @icmlconf workshop alert 🚨 We’re thrilled to announce the #ICML2025 Workshop on Machine Unlearning for Generative AI (MUGen)! ⚡Join us in Vancouver this July to dive into cutting-edge research on unlearning in generative AI—featuring an incredible lineup of…

2.0K

Zaid Khan Retweeted

Sedrick Keh@sedrickkeh2 · Jul 18

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

112

10.0K

Zaid Khan Retweeted

Peter Hase@peterbhase · Jul 14

Overdue job update -- I am now: - A Visiting Scientist at @schmidtsciences, supporting AI safety and interpretability - A Visiting Researcher at the Stanford NLP Group, working with @ChrisGPotts I am so grateful I get to keep working in this fascinating and essential area, and…

173

15.0K

Zaid Khan@codezakh · Jul 15

I’ll be at #ICML2025 this week to present ScPO: 📌 Wednesday, July 16th, 11:00 AM-1:30 PM 📍East Exhibition Hall A-B, E-2404 Stop by or reach out to chat about improving reasoning in LLMs, self-training, or just tips about being on the job market next cycle! 😃

JJason Weston@jaseweston · Nov 7

🚨 Self-Consistency Preference Optimization (ScPO)🚨 - New self-training method without human labels - learn to make the model more consistent! - Works well for reasoning tasks where RMs fail to evaluate correctness. - Close to performance of supervised methods *without* labels,…

7.0K

Zaid Khan@codezakh · Jul 11

🥳 Excited to share our work -- Retrieval-Augmented Generation with Conflicting Evidence -- on addressing conflict in RAG due to ambiguity, misinformation, and noisy/irrelevant evidence has been accepted to @COLM_conf #COLM2025! Our new benchmark RAMDocs proves challenging for…

HHan Wang@HanWang98 · Apr 18

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a…

2.0K

Zaid Khan Retweeted

Ziyang Wang@ZiyangW00 · Jul 10

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles…

12.0K

Zaid Khan@codezakh · Jul 9

🎉 Excited to share that TaCQ (Task-Circuit Quantization), our work on knowledge-informed mixed-precision quantization, has been accepted to #COLM2025 @COLM_conf! Happy to see that TaCQ was recognized with high scores and a nice shoutout from the AC – big thanks to @EliasEskin…

EElias Stengel-Eskin@EliasEskin · Apr 11

🚨Announcing TaCQ 🚨 a new mixed-precision quantization method that identifies critical weights to preserve. We integrate key ideas from circuit discovery, model editing, and input attribution to improve low-bit quant., w/ 96% 16-bit acc. at 3.1 avg bits (~6x compression)…

3.0K

Zaid Khan Retweeted

Prateek Yadav@prateeky2806 · Jul 9

I've officially joined Meta Superintelligence Labs (MSL) org in the Bay Area. I'll be working on critical aspects of pre-training, synthetic data and RL for the next generation of models. Humbled and eager to contribute to the quest for superintelligence. @AIatMeta

822

78.0K

Zaid Khan@codezakh · Jul 9

🎉 Very excited to see TaCQ — our work on task-conditioned mixed-precision quantization that draws on interpretability methods — accepted to @COLM_conf #COLM2025 with strong scores and a nice shoutout from the AC! Kudos to Hanqi on leading this effort!

EElias Stengel-Eskin@EliasEskin · Apr 11

4.0K

Zaid Khan@codezakh · Jul 8

🥳Our work UTGen & UTDebug on teaching LLMs to generate effective unit tests & improve code debugging/generation has been accepted to @COLM_conf #COLM2025! Stay tuned for more exciting results -- e.g., using 32B-scale UTGen models to improve debugging with frontier models like…

AArchiki Prasad ✈️ ICML@ArchikiPrasad · Feb 4

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

7.0K

Zaid Khan@codezakh · Jul 2

🎉 Yay, welcome @hyunji_amy_lee -- super excited to have you join us as a postdoc! 🤗 Welcome to our MURGe-Lab + @unc_ai_group + @unccs family & the beautiful Research Triangle area -- looking forward to the many fun+impactful collaborations together 🔥

hhyunji amy lee@hyunji_amy_lee · Jul 2

🥳Excited to share that I’ll be joining @unccs as postdoc this fall. Looking forward to work with @mohitban47 & amazing students at @unc_ai_group. I'll continue working on retrieval, aligning knowledge modules with LLM's parametric knowledge, and expanding to various modalities.

4.0K

Zaid Khan Retweeted

hyunji amy lee@hyunji_amy_lee · Jul 2

159

19.0K

Zaid Khan@codezakh · Jun 27

🎉 Excited to share that CAPTURe has been accepted to #ICCV2025! CAPTURe is a new benchmark for VLM reasoning that requires completing patterns to count objects that are occluded from view. We find that SOTA VLMs struggle with both counting and reasoning about partial patterns!…

EElias Stengel-Eskin@EliasEskin · Apr 23

Check out 🚨CAPTURe🚨 -- a new benchmark and task testing spatial reasoning by making VLMs count objects under occlusion. Key Takeaways: ➡️ SOTA VLMs (GPT-4o, Qwen2-VL, Intern-VL2) have high error rates on CAPTURe (but humans get very low error ✅) and models struggle to reason…

5.0K

Zaid Khan Retweeted

Jason Ren@RenZhongzheng · Jun 30

🥳 Excited to share that I’ll be joining the CS Department at UNC-Chapel Hill (@unccs @unc_ai_group) as an Assistant Professor starting Fall 2026! Before that, I’ll be working at Ai2 Prior (@allen_ai @Ai2Prior) and UW (@uwcse) on multimodal understanding and generation.

113

9.0K

Zaid Khan@codezakh · Jul 1

🎉 Yay, welcome to the @unc @unccs @unc_ai_group family and beautiful Research Triangle area, Jason! Looking forward to the many exciting collaborations on these topics! 🔥 PS. If you are applying for fall2026 PhD admissions, make sure to apply to new faculty member Jason 👇

T@ ·

5.0K

Zaid Khan@codezakh · Jun 24

🚀 Excited to introduce a new member of the LRM (Large Reconstruction Models) family — 4D-LRM! 1. What is 4D-LRM? It’s a large-scale space-time model that reconstructs a dynamic object from any few views at any time to any view at any other time. 2. What does it do? 🔁 Learn…

MMartin Ziqiao Ma@ziqiao_ma · Jun 24

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any time to any view at any other time? Introducing 4D-LRM: a Large Space-Time Reconstruction Model that ... 🔹 Predicts 4D Gaussian primitives directly from…

3.0K

Zaid Khan@codezakh · Jun 26

🎉Excited to announce VEEGIE has been accepted to #ICCV2025 ! VEGGIE is a unified MLLM + Diffusion framework for instructional video editing. It presents a systematic approach spanning data, model, benchmark, and evaluation design, and shows strong multi-skill editing +…

SShoubin Yu@shoubin621 · Mar 19

🚨 Introducing VEGGIE 🥦—a unified, end-to-end, and versatile instructional video generative model. Current video editing methods struggle with: 1. Understanding direct user instructions 2. Handling diverse editing skills in one model 3. balancing multiple training…

3.0K

Zaid Khan@codezakh · Jun 23

🚨 Excited to announce MF2, a new+challenging long-video understanding dataset! MF2 covers open-license movies and focuses on key events/arcs/causal chains in the film. While people can answer MF2 questions easily, even the strongest models like Gemini 2.5 pro struggle with it!…

MManos Zaranis@ManosZaranis · Jun 23

🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like…

3.0K

Zaid Khan Retweeted

Shoubin Yu@shoubin621 · Jun 23

New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert skill selection, aggregation and deep reasoning! MEXA: 1. Selects task- and modality-relevant experts based on the query and various required multimodal…

13.0K

Zaid Khan Retweeted

Jean Mercat@MercatJean · Jun 24

We evaluated more than 1000 reasoning LLMs on 12 reasoning-focused benchmarks and made fascinating observations about cross-benchmark comparisons. You can explore all that data yourself on our HuggingFace spaces page. (1/4)

15.0K