driss guessous
@drisspg
bytes and nuggets @pytorch
Why is it so hard for claude to use venv environment I have activated
pierce.dev/notes/how-spec… just found @piercefreeman's blog, I love it
What is the optimal way to review a PR? Tests -> Impl Impl -> Tests Files w/ fewest changes -> files w/ most?
If your repo aint using pre-commit hooks, I aint gunna contribute.... except for PT I guess it gets a pass
muon-clip's η is based of the global max score from previous iter -> I imagine they also store Mi and reduce among rows every training step. This is why its hard to have 1 god attn impl, everyone wants slightly different things 🙃
The moat has finally revealed itself
> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms github.com/triton-lang/tr…
I'm using more agents than any of you. No matter how much you're using agents, I have a few more going at any given time. You're about to get left behind btw.
I spent legit 4 hours of my life tracking down 1ULP of difference in github.com/pytorch/ao/pul… MASSIVE TIL: triton-lang.org/main/python-ap…
""" Second, the linear algebra framework of linear layouts enables compilers to generate efficient code for layout conversion and code lowering for many common operators, which is absent in CUTE """ If cute restricted itself to powers of 2 maps would it also be able to do this?
Anyone have a CC hook for ensuring that the interpreter I have selected in VScode is activated for any bash commands? Also I use zsh and ``` # Shell Preference - Use zsh instead of bash for shell commands ``` doesn't appear to work in my global Claude.md
Andor is a perfect show. The Star Wars universe does not deserve it
TorchAO recently enabled their FP8 shenanigans for SM89. This means if you have an RTX 4090, you should upgrade `torchao` & start taking advantage of this! @drisspg 👨🍳 On Flux: ``` fp8dqrow: 17.322 seconds int8wo: 26.554 seconds ``` Code: gist.github.com/sayakpaul/155a…
The world if the Claude app let me have different conversations in tabs

Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…
Would you rather have better documentation for PyTorch prototype features but more BC breakages or secret APIs until we have more confidence?