Kevin Ellis

@ellisk_kellis

Cornell Computer Science, Assistant Professor. Program synthesis, AI

Ithaca, New York

Joined September 2021

171Following

2KFollowers

Pinned

Kevin Ellis@ellisk_kellis · Jun 12

New paper: World models + Program synthesis by @topwasu 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments…

105

568

484

53.0K

Kevin Ellis@ellisk_kellis · Jul 17

Today we’re launching AutumnBench, our benchmark built on @BasisOrg’s Autumn platform. It’s designed to measure world modeling and reasoning by placing humans and AI in unfamiliar worlds—with no rewards or guidance—to test who can figure out how these worlds actually work.

BBasis@BasisOrg · Jul 17

We’re proud to announce the launch of AutumnBench, an open-source benchmark developed on our Autumn platform. This benchmark, led by our MARA team, provides a novel platform for evaluating world modeling and causal reasoning in both human and artificial intelligence.

8.0K

Kevin Ellis Retweeted

Justin T Chiu@justintchiu · Jun 23

Are code agents good at software design, ie building general and reusable code? We present Librarian, a new refactoring method, and MiniCode, a verifiable refactoring benchmark that requires agents to design libraries that jointly minimizes code from multiple repos 🧵

147

19.0K

Kevin Ellis Retweeted

Alex Lew@alexanderklew · Dec 8

If you're interested in a PhD at the intersection of machine learning and programming languages, consider Yale CS! We're exploring new ways to build software that draws inferences & makes predictions. See alexlew.net & apply at gsas.yale.edu/admissions/ by Dec. 15 😃

224

119

34.0K

Kevin Ellis@ellisk_kellis · Dec 7

Thank you, François, Mike, & team, for the ARC challenge. It has been a durable source of inspiration, and brings fresh ideas to AI. The paper award first authors are Keya Hu (applying to PhDs @HuLillian39250) and Wen-Ding Li (at NeurIPS hunting for industry gigs @xu3kev).…

FFrançois Chollet@fchollet · Dec 6

Today we're announcing the winners of ARC Prize 2024. We're also publishing an extensive technical report on what we learned from the competition (link in the next tweet). The state-of-the-art went from 33% to 55.5%, the largest single-year increase we've seen since 2020. The…

11.0K