Edward Z. Yang

@ezyang

I work on PyTorch at Meta. Chatty alt at @difficultyang. Currently on parental leave and doing a lot of AI coding, including authoring codemcp.

Edison, NJ

Joined May 2008

1KFollowing

13KFollowers

Pinned

Edward Z. Yang@ezyang · Jun 2

I finally sat down and wrote down a post mortem for vibe coding ScubaDuck. It's aimed at those of you who have never tried vibe coding (in its original sense: AI coding without reviewing the code the AI generated)

ezyang's tweet image. I finally sat down and wrote down a post mortem for vibe coding ScubaDuck. It's aimed at those of you who have never tried vibe coding (in its original sense: AI coding without reviewing the code the AI generated)

157

23.0K

Edward Z. Yang Retweeted

Lucas Beyer (bl16)@giffmana · 5 h

It's worth it to make sure collecting and joining/slicing all their metrics and hparams post hoc is very low friction.

3.0K

Edward Z. Yang Retweeted

Harry Coultas Blum@harrycblum · 11 h

Saving failed batches so you can see what the hell went wrong

3.0K

Edward Z. Yang Retweeted

Neel Rajani@NeelRajani_ · 12 h

Especially for fine-tuning, start with a config file of hyperparameters like "Llama-3.2-1B_v00.01.yaml" and note in your docs what your thought process and what you found. Then copy this to v00.02 and note changes, motivations and results again. Keep a table of versions v metrics

1.0K

Edward Z. Yang Retweeted

Hannes Believer@BiotechHannes · 12 h

Baselining against simple models exposes whether complexity adds value – wish I’d done this before wasting months on overengineered clinical predictors.

936

Edward Z. Yang Retweeted

Harry Coultas Blum@harrycblum · 14 h

Saving checkpoints on crashes not just every N steps

3.0K

Edward Z. Yang Retweeted

Nikan Doosti@NIkronic · 14 h

On of the great suggestions by others, what I personally do is keep a "linked" journal. So if I was doing A, then suddenly in the middle of it observed I could do B because of x1,x2, ..., then I just create a note linking these two. Using @obsdmd, then you get a nice 'story'!

1.0K

Edward Z. Yang Retweeted

Manjunath Sripadarao@MSripadarao · 14 h

1. Train with a fixed seed in the beginning. 2. Put a small README, where you maintain semi-detailed notes about each run. 3. On Monday you won't remember anything from Friday. Start with this premise and document. 4. Start with multi-node capable code.

1.0K

Edward Z. Yang Retweeted

Zachary Shinnick@ZacharyShinnick · 15 h

Consistent naming saves way more time than you'd think. Pretty sure I’ll never remember what final_final_actually_this_one was 🤣.

3.0K

Edward Z. Yang Retweeted

Damien Teney@DamienTeney · 16 h

Save full logs and config files alongside each experiment's results. Nothing more frustrating than a good set of experiments you can't reproduce because of one missing detail.

1.0K

Edward Z. Yang Retweeted

navneeth@navOnTttr · 16 h

Hydra is such a god send. I wish it was available when I started out • Every run is logged with a full config snapshot • Supports multirun for ablation studies across hyperparameters And Parallel coordinate plots give a strong visual feel for key parameters.

1.0K

Edward Z. Yang Retweeted

Onur Berk Töre@OnurBerkTore · 17 h

- always use config, do not use args. - spend time to learn wandb or similar, automatically saves code/configs/models. - name the config with <#>_<active_change><#+1> (example: 22_LR_001_23.yaml, 22_LR_0001_24.yml) - storage is cheap, copy your data rather than in-place modify.

2.0K

Edward Z. Yang Retweeted

main@main_horse · 19 h

always build the abstraction/interface/tool/database. do not ad-hoc, do not make one-off scripts / bespoke storage, do not do things only you will remember

184

6.0K

Edward Z. Yang Retweeted

Swayson@Swayson · 19 h

keep a simple journal, for free flowing notes.

1.0K

Edward Z. Yang Retweeted

Rick@rickasaurus · 20 h

Save your inputs and outputs as they were exactly along with any config. Reproducibility is hard, especially hard in a pipeline with multiple non deterministic AIs.

4.0K

Edward Z. Yang Retweeted

Francisco Javier Arceo@franciscojarceo · 20 h

model reproducibility requires data reproducibility 🥲

3.0K

Edward Z. Yang Retweeted

rachit@rachtsingh · 21 h

standardizing and automating evaluations (across the team) can be tiring but is always worth it; make a separate config seed for every bit of randomness you introduce

2.0K

Edward Z. Yang@ezyang · 21 h

People who run lots of small training jobs for your day job, what is one thing about experiment management / hygiene that you wish you knew when you started out?

465

506

61.0K

Edward Z. Yang@ezyang · Jul 17

Been spending some time with the GSPMD paper recently. It's funny seeing all the work on making convolution work; 2021 truly was a different era

2.0K