Samuel Sokota

@ssokota

PhD Student at @CarnegieMellon

Joined December 2015

230Following

864Followers

Tic-Tac-Toe... but the opponent's moves are hidden. Can you outsmart our top RL agents? Play here: nathanlichtle.com/research/2p0s

NNathan Lichtlé@nathanlichtle · Feb 14

We ran 5,600 hyperparameter sweeps to compare RL algorithms on hidden-information games with billions of states. In our benchmark, we found that properly tuned policy gradient methods, such as PPO, performed the best. Paper: arxiv.org/abs/2502.08938

5.0K

Samuel Sokota Retweeted

Eugene Vinitsky 🍒🦋@EugeneVinitsky · Feb 6

We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions.

893

405

97.0K