S
Samuel Sokota
@ssokota
PhD Student at @CarnegieMellon
Joined December 2015
230Following
864Followers
S
Samuel Sokota@ssokota · Feb 14
Tic-Tac-Toe... but the opponent's moves are hidden. Can you outsmart our top RL agents? Play here: nathanlichtle.com/research/2p0s
We ran 5,600 hyperparameter sweeps to compare RL algorithms on hidden-information games with billions of states. In our benchmark, we found that properly tuned policy gradient methods, such as PPO, performed the best. Paper: arxiv.org/abs/2502.08938
1
2
11
4
5.0K
Samuel Sokota Retweeted
E
Eugene Vinitsky 🍒🦋@EugeneVinitsky · Feb 6
We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions.
33
96
893
405
97.0K