Chenghao Yang
@chrome1996
Ph.D. student @UChicago Ex-SR @google Ex-Scientist @AWS. Ex-RA @jhuCLSP @columbianlp @TsinghuaNLP. Ex-Intern @IBM @AWS. Opinions are my own.
Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and…
Nice work! I thought about error analysis for LLMs automatically since 2022 when I started PhD but I did not figure out how. I also discussed with @ZhongRuiqi about applying his work to help debug LLM performance but was distracted by logistics. Great to see it finally works!
Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
🧐Can we create a navigational agent that can handle thousands of new objects across a wide range of scenes? 🚀 We introduce DivScene bench and NatVLM. DivScene contains houses of 81 types and thousands of target objects. NatVLM is an end-to-end agent based on a Large Vision…