Pramod Goyal
@goyal__pramod
Trying to change the world one line at a time AI Dev zetta global prev @joindimension founder @hacktogetherdev
Most influential LLM papers and the ideas they introduced (post 2017) A long thread 🧵

I learned a fascinating thing. I used to think that KV caching is the name of the method, and you cache everything except the newest token output. But that is not the case; I cannot wrap my head around the matrix multiplication. But after I do, I will write about it.
I was working on kv caching and found an interesting short HF write up on it. If you wanna grasp the concept quickly, I will recommend checking it out.
If you have any recommendations, do share. I would love to check em out!!
Working on a small reading section where I will add my favourite niche blogs, repos, tutorials, and books. That aren't very popular but extremely helpful.
Working on a small reading section where I will add my favourite niche blogs, repos, tutorials, and books. That aren't very popular but extremely helpful.

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Wait so Alibaba Qwen has just released ANOTHER model?? Qwen3-Coder is simply one of the best coding model we've ever seen. → Still 100% open source → Up to 1M context window 🔥 → 35B active parameters → Same performance as Sonnet 4 They're releasing a CLI tool as well ↓
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
Generative video is incredible, but ask it to explain a simple idea, and it often fails! Today we’re excited to introduce Programmatic Storytelling – a whole new way to craft videos and tell stories, built on vectors and code, not just pixels. @genime_labs
Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…
How to train a model that actually understands both audio and text like Voxtral from @MistralAI? Here is a quick video walkthrough of the paper.
Kimi K2 paper dropped! describes: - MuonClip optimizer - large-scale agentic data synthesis pipeline that systematically generates tool-use demonstrations via simulated and real-world environments - an RL framework that combines RLVR with a self- critique rubric reward mechanism…
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
and it's done.. scratch implementations for > Tensor > Parameter and Module > Linear layer > Relu > Sequential > SGD optimizer
A visual prompt ablation, that's awesome!!!
They have a short but equally amazing Stable Diffusion visualizer My life would have been so much simpler if I found this earlier
They have a short but equally amazing Stable Diffusion visualizer My life would have been so much simpler if I found this earlier
The same guys have an insanely amazing GAN visualizer