Uzay @ sf

@uzpg_

CS / Math / philosophy | ML research @ MIT | France-US

Boston/Paris

Joined June 2020

1KFollowing

1KFollowers

Pinned

Uzay @ sf@uzpg_ · Dec 16, 2021

Hi! Here's some information about who I am, what I'm interested in, and some of my "values" — building on my personal wiki, website and general digital presence.

UUzay @ sf@uzpg_ · Dec 12, 2021

I plan on doing a big "who am i? what do i like? what am I interested in?" pinned tweet tomorrow, sort of as a tree of tweets instead of just a linked list. Very excited because I can't really express much about myself in my bio, nor have I really worked on it.

Uzay @ sf@uzpg_ · Jul 23

reading through a bunch of swebench logs a few months ago made me realize it wouldn't be a good measure of the kind of long horizon coding ability I needed for a research project, which is why we made breakpoint

EEpoch AI@EpochAIResearch · Jun 13

SWE-bench Verified is one of the main benchmarks to assess AI coding skills. But what does it actually measure? We found that it's one of the best tests of AI coding, but limited by its focus on simple bug fixes in familiar repositories. Here’s a summary of our article 🧵

1.0K

Uzay @ sf Retweeted

yudhister@yudhister_ · Jul 18

Crowdsourcing solutions to the world's highest-leverage problems this Saturday! Join us hacking on dynamic protein prediction, labor automation modeling, and acoustic window cranioplasties (altho I'm not sure how the last one will work) Links below

2.0K

Uzay @ sf@uzpg_ · Jul 15

what are the current best systems or ideas for the human input continual learning described here?

AAndrej Karpathy@karpathy · Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

458

Uzay @ sf@uzpg_ · Jul 14

we are going to start having self evolving software - user interaction patterns and inputs automatically dictate changes, at least in the frontend, quite soon and it's going to be pretty fun :)

UUzay @ sf@uzpg_ · Jul 14

doing frontend with claude code is so fun favorite use of a model

511

Uzay @ sf@uzpg_ · Jul 12

having this weird issue with claude where it repeatedly refuses to use the code execution tool I give it, preferring to embed its tool inside of its response in this strange way probably because of some internal RL on tool calling, kinda interesting

uzpg_'s tweet image. having this weird issue with claude where it repeatedly refuses to use the code execution tool I give it, preferring to embed its tool inside of its response in this strange way

probably because of some internal RL on tool calling, kinda interesting

786

Uzay @ sf@uzpg_ · Jul 9

any cool poetry stuff in the bay?

673

Uzay @ sf@uzpg_ · Jul 8

"I think nature's imagination is so much greater than man's, she's never going to let us relax" - richard feynman good thought for an ML researcher in this day and age

291

Uzay @ sf@uzpg_ · Jul 8

Breakpoint got into COLM! Will be in Montreal in October :) Also have been improving the library for ease of use, and getting it in production with different research orgs - getting it inspect compatible

UUzay @ sf@uzpg_ · Jun 3

@kaivu, @atticuswzf , and I were researching long horizon reasoning (with @jacobandreas). We found existing benchmarks’ hard problems often featured tricky puzzles, not tests of system understanding. So we made Breakpoint: a SWE benchmark designed to disambiguate this capability.

870

Uzay @ sf@uzpg_ · Jul 6

what are the best AI for science papers?

388

Uzay @ sf@uzpg_ · Jul 3

what are the best multi agent AI systems right now?

556

Uzay @ sf@uzpg_ · Jul 2

any actually useful books for advice and frameworks on getting shit done in the world or inspiring biographies

528

Uzay @ sf@uzpg_ · Jun 27

how good is claude code at rust?

327