driss guessous

@drisspg

bytes and nuggets @pytorch

Joined December 2023

177Following

674Followers

driss guessous@drisspg · Jul 22

Why is it so hard for claude to use venv environment I have activated

491

driss guessous@drisspg · Jul 18

pierce.dev/notes/how-spec… just found @piercefreeman's blog, I love it

415

driss guessous@drisspg · Jul 17

What is the optimal way to review a PR? Tests -> Impl Impl -> Tests Files w/ fewest changes -> files w/ most?

256

driss guessous@drisspg · Jul 16

If your repo aint using pre-commit hooks, I aint gunna contribute.... except for PT I guess it gets a pass

416

driss guessous@drisspg · Jul 16

muon-clip's η is based of the global max score from previous iter -> I imagine they also store Mi and reduce among rows every training step. This is why its hard to have 1 god attn impl, everyone wants slightly different things 🙃

457

driss guessous@drisspg · Jul 9

The moat has finally revealed itself

ssophia@cis_female · Jul 9

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms github.com/triton-lang/tr…

919

driss guessous Retweeted

Charlie Marsh@charliermarsh · Jul 7

I'm using more agents than any of you. No matter how much you're using agents, I have a few more going at any given time. You're about to get left behind btw.

502

41.0K

driss guessous@drisspg · Jul 3

I spent legit 4 hours of my life tracking down 1ULP of difference in github.com/pytorch/ao/pul… MASSIVE TIL: triton-lang.org/main/python-ap…

drisspg's tweet card. Stacked PRs: ->Add kernel #2439 Add Kernel w/ this we get 3.7 real world speed up for llama 70b MLP 1024 tokes -> not user per-tensor scaling path With kernel https://fburl.com/0g0...

3.0K

driss guessous@drisspg · Jul 2

""" Second, the linear algebra framework of linear layouts enables compilers to generate efficient code for layout conversion and code lowering for many common operators, which is absent in CUTE """ If cute restricted itself to powers of 2 maps would it also be able to do this?

1.0K

driss guessous@drisspg · Jul 1

Anyone have a CC hook for ensuring that the interpreter I have selected in VScode is activated for any bash commands? Also I use zsh and ``` # Shell Preference - Use zsh instead of bash for shell commands ``` doesn't appear to work in my global Claude.md

drisspg's tweet card. Learn about Claude Code, Anthropic's agentic coding tool that lives in your terminal and helps you turn ideas into code faster than ever before.

831

driss guessous@drisspg · Jun 26

Andor is a perfect show. The Star Wars universe does not deserve it

955

driss guessous@drisspg · Jun 24

417

driss guessous Retweeted

Sayak Paul@RisingSayak · Jun 12

TorchAO recently enabled their FP8 shenanigans for SM89. This means if you have an RTX 4090, you should upgrade `torchao` & start taking advantage of this! @drisspg 👨‍🍳 On Flux: ``` fp8dqrow: 17.322 seconds int8wo: 26.554 seconds ``` Code: gist.github.com/sayakpaul/155a…

3.0K

driss guessous@drisspg · May 30

The world if the Claude app let me have different conversations in tabs

392

driss guessous Retweeted

Haicheng Wu@asdf1234_0 · May 16

Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…

9.0K

driss guessous@drisspg · May 13

Would you rather have better documentation for PyTorch prototype features but more BC breakages or secret APIs until we have more confidence?

714