Yuvraj Singh
@YuvrajS9886
| Ex - AI Engg @turboml, @puch_ai | @iitmadras (left), @iiserkol, @UofMaryland, AIISC | YESIST '24 Finalist | Multimodal LLMs Research| Building SmolHub ☺️
So, I re-implemented DeepSeekV3 from scratch in PyTorch. This is how it went - A thread 🧵 Paper - arxiv.org/abs/2412.19437 Github - github.com/YuvrajSingh-mi…

been working on MAPPO for past few days The bugs or rather the silent killers are too much, The Python debugger helped a lot, as did subtle tricks to stabilise the norms and Gemini Sense. But found a core reason why it is not working today (everyone look at your env rewards and…
Good read on forms of MARL vinaylanka.medium.com/multi-agent-re…
Over the last few months, I started this challenge that honestly changed my life in the best way possible. It pushed me to apply to way more things, ended up landing interviews at Apple, Microsoft, Hugging Face, W&B… even got a 6-figure grant. Met and worked with some insanely…
Implemented IPPO from scratch in Pytorch IPPO or Independent PPO is a MARL concept where each agent is independent and has its critic. Like PPO but repeated n times. This is considered as the baseline for many MARL algorithms to compare with like MAPPO etc (my next post) This…

Implemented IPPO from scratch in Pytorch IPPO or Independent PPO is a MARL concept where each agent is independent and has its critic. Like PPO but repeated n times. This is considered as the baseline for many MARL algorithms to compare with like MAPPO etc (my next post) This…

Is anyone implementing stuff like neural nets basic, cnns, adam etc in jax? why numpy?
Is anyone implementing stuff like neural nets basic, cnns, adam etc in jax? why numpy?
🚨 JUST IN 🚨 Bring in your first 100 users by midnight and get a FREE Swiggy Voucher worth ₹500 The clock is ticking, go go go!
Guys, I'm actively looking for professors/labs to work under. My speciality is diffusion modeling and llms. If you know someone who is looking for interns, please do let me know. Appreciate any kind of leads 🙏 Thank you!!
My Reinforcement Learning (RL) & Agents 3 hour workshop is out! I talk about: 1. RL fundamentals & hacks 2. "Luck is all you need" 3. Building smart agents with RL 4. Closed vs Open-source 5. Dynamic 1bit GGUFs & RL in @UnslothAI 6. The Future of Training youtube.com/watch?v=OkEGJ5…
Implemented Self Play using PPO from scratch! Code - github.com/YuvrajSingh-mi… Env - pong_v3 Trained the above agent in the env from pettinzoo. The model weights have been uploaded to the above link, and you can download and play against the agent (play_pong.py)!
Implemented Self Play using PPO from scratch! Code - github.com/YuvrajSingh-mi… Env - pong_v3 Trained the above agent in the env from pettinzoo. The model weights have been uploaded to the above link, and you can download and play against the agent (play_pong.py)!