vincent
@vvhuang_
understanding models @TransluceAI, writing http://mindslice.substack.com previously: hotel manager @MIT, math @0xPARC
only took me 4 years (+ daily prodding from @mengk20 😅) to realize: - website layout should reflect the info you want to communicate, rather than just copying patterns you saw on other sites - half the elements on my homepage weren’t doing anything


Pro tip: you can basically travel the world by using google maps.
pro tip: you can basically read >100 books per day by asking chatgpt to summarize them for you.
i think it's really cute that Iowa State University writes language model reasoning papers about agriculture

sometimes you have to apply exponential backoff when texting new people
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
OpenAI o3 and o4-mini openai.com/live/
🤮📉🚫 building yet another LLM benchmark 🥰📈🌈 building a tool that can make every existing benchmark more useful very excited to share Docent: a system that can look through eval results and identify unusual model behaviors, cheating, env setup issues, etc. in just minutes!
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇
back when I was young, I thought it was unrealistic for the Volunteer Fire Department to schism into a branch that fought fires and a branch that started them
everybody deserves to see interstellar in imax. i would get a lifetime amc a-list subscription if it meant interstellar would always be in theatres. id go every wednesday night and have my own chair and everything
learned more physics working on this for 1 week than the entire preceding year 🫡 check out our experiments on functional ultrasound and the acoustoelectric effect!!!
Can we invent new brain-computer interface modalities? @raffi_hotter and I got 9 friends together and built a lab at home to test two totally new imaging methods: acoustoelectric imaging & functional ultrasound through the skull 🧵 story that involves nV measurements, pretty…
the way zuck added this apple complaint in the middle of the llama3.1 announcement 😆
