Andrew Zhanke Zhou
@zhankezhou
PhD student at TMLR group, HKBU. Visiting student at STAIR lab, Stanford. Focus on trustworthy machine reasoning research.
Tired of debugging LLMs by reading the extremely long chain of thoughts? We built Landscape of Thoughts (LoT) to transform complex thoughts into intuitive visual maps to help you understand model behaviors. Paper and findings in 🧵 1/10 youtu.be/Zb8CfYxSvik?si… via @YouTube
🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14
🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: openreview.net/forum?id=rWkGF…
We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.
Hearty congratulations to @chelseabfinn, @DorsaSadigh and @sanmikoyejo for all winning a Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor of the U.S. government for outstanding early career scientists and engineers. whitehouse.gov/ostp/news-upda…
🧵1/8 Thrilled to share our research work: Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales? #NeurIPS2024
📍NeurIPS 2024 @ Vancouver We will present our work at 16:30-19:30 in East Exhibit Hall A-C #3602 today, come on to have a talk with us! #NeurIPS2024
Answering by approximate retrieval or by understanding+reasoning are two ends of a spectrum. Humans are at various places on this spectrum, depending on the task, experience, and depth of understanding. We see this in physics or math students: some will study very hard, do lots…
Unfortunately , too few people understand the distinction between memorization and understanding. It's not some lofty question like "does the system have an internal world model?", it's a very pragmatic behavior distinction: "is the system capable of broad generalization, or is…
My weekend exercise in porting Neural Bellman Ford nets to MLX and MLX-graphs that runs natively on Apple GPUs: github.com/migalkin/NBFNe… The MLX ecosystem for Apple Silicon grows very quickly - thanks @tristanbilot and @awnihannun for all the help :)
🔥#NeurIPS2023 Tutorial🔥 Language Models meet World Models @tianminshu & I are excited to give tutorial on machine reasoning by connecting LLMs🗣️ world models🌎 agent models🤖 w/ amazing panelists @jiajunwu_cs @du_yilun Ishita Dasgupta,Noah Goodman sites.google.com/view/neurips20…