Yuvraj Singh (@YuvrajS9886)

Pinned

Y

Yuvraj Singh@YuvrajS9886 · Jun 11

So, I re-implemented DeepSeekV3 from scratch in PyTorch. This is how it went - A thread 🧵 Paper - arxiv.org/abs/2412.19437 Github - github.com/YuvrajSingh-mi…

YuvrajS9886's tweet image. So, I re-implemented DeepSeekV3 from scratch in PyTorch.

This is how it went - A thread 🧵

Paper - arxiv.org/abs/2412.19437

Github - github.com/YuvrajSingh-mi…

15

61

547

529

39.0K

Y

Yuvraj Singh@YuvrajS9886 · Jul 26

been working on MAPPO for past few days The bugs or rather the silent killers are too much, The Python debugger helped a lot, as did subtle tricks to stabilise the norms and Gemini Sense. But found a core reason why it is not working today (everyone look at your env rewards and…

0

3

0

169

Y

Yuvraj Singh@YuvrajS9886 · Jul 26

Good read on forms of MARL vinaylanka.medium.com/multi-agent-re…

YuvrajS9886's tweet card. Multi-Agent Reinforcement Learning or MARL is a subfield of Reinforcement Learning that extends the Reinforcement Learning concept of…

0

1

0

192

Y

Yuvraj Singh@YuvrajS9886 · Jul 26

Good thread on MARL vinaylanka.medium.com/multi-agent-re…

0

10

14

374

Yuvraj Singh Retweeted

A

Adithya S K@adithya_s_k · Jul 24

Over the last few months, I started this challenge that honestly changed my life in the best way possible. It pushed me to apply to way more things, ended up landing interviews at Apple, Microsoft, Hugging Face, W&B… even got a 6-figure grant. Met and worked with some insanely…

20

29

410

274

26.0K

Yuvraj Singh Retweeted

S

Siddharth Bhatia@siddharthb_ · Jul 23

170

60

1.0K

136

1.1M

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

Implemented IPPO from scratch in Pytorch IPPO or Independent PPO is a MARL concept where each agent is independent and has its critic. Like PPO but repeated n times. This is considered as the baseline for many MARL algorithms to compare with like MAPPO etc (my next post) This…

YuvrajS9886's tweet image. Implemented IPPO from scratch in Pytorch

IPPO or Independent PPO is a MARL concept where each agent is independent and has its critic.

Like PPO but repeated n times.
This is considered as the baseline for many MARL algorithms to compare with like MAPPO etc (my next post)

This…

1

4

18

12

705

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

Implemented IPPO from scratch in Pytorch IPPO or Independent PPO is a MARL concept where each agent is independent and has its critic. Like PPO but repeated n times. This is considered as the baseline for many MARL algorithms to compare with like MAPPO etc (my next post) This…

1

0

20

4

320

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

When Gemini says:

2

0

7

0

266

Yuvraj Singh Retweeted

m

mark zuckerbum@sadernoheart · Jul 23

repo: github.com/julienokumu/So…

0

1

0

88

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

Is anyone implementing stuff like neural nets basic, cnns, adam etc in jax? why numpy?

4

0

10

0

881

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

Is anyone implementing stuff like neural nets basic, cnns, adam etc in jax? why numpy?

5

0

26

4

917

Y

Yuvraj Singh@YuvrajS9886 · Jul 23

🚨 JUST IN 🚨 Bring in your first 100 users by midnight and get a FREE Swiggy Voucher worth ₹500 The clock is ticking, go go go!

SSiddharth Bhatia@siddharthb_ · Jul 23

97

44

87

7

12.0K

Yuvraj Singh Retweeted

y

yashwanth@yashwanth__e · Jul 23

Guys, I'm actively looking for professors/labs to work under. My speciality is diffusion modeling and llms. If you know someone who is looking for interns, please do let me know. Appreciate any kind of leads 🙏 Thank you!!

0

1

10

1

348

Yuvraj Singh Retweeted

D

Daniel Han@danielhanchen · Jul 21

My Reinforcement Learning (RL) & Agents 3 hour workshop is out! I talk about: 1. RL fundamentals & hacks 2. "Luck is all you need" 3. Building smart agents with RL 4. Closed vs Open-source 5. Dynamic 1bit GGUFs & RL in @UnslothAI 6. The Future of Training youtube.com/watch?v=OkEGJ5…

23

226

1.0K

2.0K

125.0K

Y

Yuvraj Singh@YuvrajS9886 · Jul 22

Implemented Self Play using PPO from scratch! Code - github.com/YuvrajSingh-mi… Env - pong_v3 Trained the above agent in the env from pettinzoo. The model weights have been uploaded to the above link, and you can download and play against the agent (play_pong.py)!

0

4

1

189

Y

Yuvraj Singh@YuvrajS9886 · Jul 22

Implemented Self Play using PPO from scratch! Code - github.com/YuvrajSingh-mi… Env - pong_v3 Trained the above agent in the env from pettinzoo. The model weights have been uploaded to the above link, and you can download and play against the agent (play_pong.py)!

0

4

0

156