sachit gaudi
@gaudi_sachit
Research: Generalisation, Diffusion models. Grad student @michiganstateu. IIT Guwahati Alum.
Wow! this is so beautiful and well written. Addicted to this homework!
Assignment 1 (get basic pipeline working): implement BPE tokenizer, Transformer architecture, Adam optimizer, train models on TinyStories and OpenWebText. Only PyTorch primitives are allowed (can’t just call torch.nn.Transformer or even torch.nn.Linear). github.com/stanford-cs336…
Trustworthy representations must be robust against distribution shifts. @gautamsree_ shows that you must explicitly use your interventional causal knowledge to learn robust representations, instead of simply augmenting your dataset with interventional data.
Representations learned using standard ERM often fail to generalize under interventional distribution shifts because they ignore the causal structure revealed by interventions. Here's how to learn robust representations. 🧵👇(1/9)
Asymmetry of NeurIPS reviews: Authors can submit half-baked work with no penalization, but reviewers are expected to evaluate to a very high standard or face significant penalties.