Caiqi Zhang
@caiqizh
PhD student at University of Cambridge;
🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇


I just met Caiqi Zhang – one of the users of our Python framework for uncertainty quantification, LM-Polygraph. It’s incredibly rewarding to see our work helping other researchers achieve outstanding results! LM-Polygraph: github.com/IINemo/lm-poly… #EMNLP2024 #Uncertainty #NLP
🚨 [Call for Papers] SEA @ NeurIPS 2025 🚨 Scaling Environments for Agents (SEA) Workshop 📅 December 6, 2025 | 📍 San Diego, USA We're excited to invite submissions to the SEA Workshop at NeurIPS 2025! 🧵1/n
Thrilled to announce that our paper, “Conformity in Large Language Models,” has been accepted to the ACL 2025 Main Conference! 🎉 Looking forward to presenting our findings at ACL 2025 in Vienna this July! #AI #NLP #LLMs #Conformity #MachineLearning #AIResearch #Psychology #ACL
Inception Lab and Gemini Diffusion are hot these days. Just published a blog post on Diffusion Language Models! 🚀 Exploring how diffusion (yes, the image model kind) can be used for text generation. Check it out👇 spacehunterinf.github.io/blog/2025/diff… #NLP #LLMs #DiffusionModels
🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.
🚨 New paper: "Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs" Insight: LLMs treat equivalent facts differently due to bias from pre-training data. 🔗 Arxiv: arxiv.org/abs/2503.22362 #NLP #LLMs #AI
🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄
Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.
🚨 New Paper Alert! 🚨 When using LLMs for judgements, ever wondered about the consistency of those judgments? 🤔 Check out our latest work, where we quantify, evaluate, and enhance the logical/preference consistency of LLMs. 📚 🔗 Read more: arxiv.org/abs/2410.02205
Life update: 🎉 I'm excited to share that I will be joining @HKUSTGuangzhou as an Assistant Professor in Spring 2025! I'm looking for multiple PhDs and interns who are passionate about exploring research questions related to knowledge and reasoning in the context of LLMs. 🤖
I will present this work in EMNLP Nov 13 Wednesday! See you there!
Happy to share that TopViewRS is selected as oral presentation at @emnlpmeeting #EMNLP2024 Though I’m attending virtually, @caiqizh will present our work on Nov 13 11:45 to 12:00 at Ashe Auditorium. If you are interested in multimodality/spatial reasoning, feel free to reach out!
Happy to share that TopViewRS is selected as oral presentation at @emnlpmeeting #EMNLP2024 Though I’m attending virtually, @caiqizh will present our work on Nov 13 11:45 to 12:00 at Ashe Auditorium. If you are interested in multimodality/spatial reasoning, feel free to reach out!
Excited to introduce TopViewRS: VLMs as Top-View Spatial Reasoners🤖 TopViewRS assess VLMs’ spatial reasoning in top-view scenarios🏠just like how you read maps🗺️ Spoiler🫢GPT4V and Gemini are neck-and-neck, each excelling in different setups but neither even close to us humans