Kevin Ellis
@ellisk_kellis
Cornell Computer Science, Assistant Professor. Program synthesis, AI
New paper: World models + Program synthesis by @topwasu 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments…
Today we’re launching AutumnBench, our benchmark built on @BasisOrg’s Autumn platform. It’s designed to measure world modeling and reasoning by placing humans and AI in unfamiliar worlds—with no rewards or guidance—to test who can figure out how these worlds actually work.
We’re proud to announce the launch of AutumnBench, an open-source benchmark developed on our Autumn platform. This benchmark, led by our MARA team, provides a novel platform for evaluating world modeling and causal reasoning in both human and artificial intelligence.
Are code agents good at software design, ie building general and reusable code? We present Librarian, a new refactoring method, and MiniCode, a verifiable refactoring benchmark that requires agents to design libraries that jointly minimizes code from multiple repos 🧵
If you're interested in a PhD at the intersection of machine learning and programming languages, consider Yale CS! We're exploring new ways to build software that draws inferences & makes predictions. See alexlew.net & apply at gsas.yale.edu/admissions/ by Dec. 15 😃
Thank you, François, Mike, & team, for the ARC challenge. It has been a durable source of inspiration, and brings fresh ideas to AI. The paper award first authors are Keya Hu (applying to PhDs @HuLillian39250) and Wen-Ding Li (at NeurIPS hunting for industry gigs @xu3kev).…
Today we're announcing the winners of ARC Prize 2024. We're also publishing an extensive technical report on what we learned from the competition (link in the next tweet). The state-of-the-art went from 33% to 55.5%, the largest single-year increase we've seen since 2020. The…