Wai Keen Vong
@wkvong
building neural interfaces @Meta @RealityLabs, prev: cognitive science @nyuniversity @Rutgers_Newark @UniOfAdelaide
1/ Today in Science, we train a neural net from scratch through the eyes and ears of one child. The model learns to map words to visual referents, showing how grounded language learning from just one child's perspective is possible with today's AI tools. science.org/doi/10.1126/sc…

New ARC-AGI paper @arcprize w/ fantastic collaborators @xu3kev @HuLillian39250 @ZennaTavares @evanthebouncy @BasisOrg For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning? cs.cornell.edu/~ellisk/docume…
Exciting episode! 🎙️ We chat with Dr. Wai Keen Vong (@wkvong), lead author of the Science paper where an artificial neural network mapped words to referents using a child's egocentric recordings. Plus, we dive into Wai's journey in science! share.fireside.fm/episode/R8yhKe…
We have a new preprint led by @solimlegris and @wkvong (w/ @LakeBrenden ) looking at how people solve the Abstraction and Reasoning benchmark. arxiv.org/abs/2409.01374 ARC is one of the most challenging & long standing AGI benchmarks and the focus of a large $$ prize @arcprize 🧵
One inspiration for ARC-AGI solutions is the psychology how humans solve novel tasks. A new study by @todd_gureckis @LakeBrenden @solimlegris @wkvong @ NYU explores human performance on ARC, finding that 98.7% of the public tasks are solvable by at least 1 MTurker.
Researchers at NYU did a study on whether MTurkers could solve ARC-AGI tasks. They found that 790/800 (98.7%) of the public tasks are solvable by at least one MTurker (each task was seen by about 10 people): arxiv.org/abs/2409.01374 As a reminder, the private test set was…
How do we measure similarities and differences between how kids and models learn language? We introduce DevBench, a suite of multimodal developmental evaluations with corresponding human data from both children and adults. 🧵 1/ 📝: arxiv.org/abs/2406.10215
Fun (and science) with the family highlighted in today's NYT. Comments from some of my favorite scientists @cocosci_lab, Linda Smith, and @mcxfrank whose study Luna is participating in. Thanks @oliverwhang21 for the great writing. nytimes.com/2024/04/30/sci…
Linda Smith, one of the scientists I most admire, writes in Nature "News and Views" about @wkvong 's work. Linda's conjecture: "problems of data-greedy AI could be mitigated by determining and then exploiting the natural statistics of infant experience" nature.com/articles/d4158…
1/ Today in Science, we train a neural net from scratch through the eyes and ears of one child. The model learns to map words to visual referents, showing how grounded language learning from just one child's perspective is possible with today's AI tools. science.org/doi/10.1126/sc…
In just a minute, @wkvong is being interviewed on NPR Science Friday, about training models through the eyes and ears of a child. listen in here: kunc.org Also, see the NPR link here: sciencefriday.com/segments/langu…
My thoughts on the fascinating new study in Science science.org/doi/10.1126/sc… by Vong et al in the Financial Times. 1/2 ft.com/content/f67196…
Excellent article on @wkvong 's new work in the Washington Post by @Carolynyjohnson, with discussion from Josh Tenenbaum and Michael Tomasello. Indeed, the next q is how verbs, pronouns, and abstract words can be learned. How far will data-driven go? washingtonpost.com/science/2024/0…
Love the science.org home page right now (this isn't the baby who participated in the current study, but in the next one :))
What can an AI model learn when given the data from just one child? A surprising answer comes from a groundbreaking study published in Science by CDS Research Scientist Wai Keen Vong (@wkvong) and CDS Assistant Professor Brenden Lake (@LakeBrenden). nyudatascience.medium.com/what-can-ai-sy…
Published in Science today, @wkvong reports a dream experiment: he trained a multi-modal AI model from scratch on a subset of one child's experiences, as captured by headcam video. Shows how grounded word learning is possible in natural settings, as discussed in his thread:
1/ Today in Science, we train a neural net from scratch through the eyes and ears of one child. The model learns to map words to visual referents, showing how grounded language learning from just one child's perspective is possible with today's AI tools. science.org/doi/10.1126/sc…