Karl Pertsch (@KarlPertsch)

Pinned

K

Karl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

14

83

408

239

92.0K

Pinned

K

Karl Pertsch@KarlPertsch · Jun 21

It was time to improve our evaluations in robot learning! We introduce a methodology based on anonymous A/B testing: fairer, stronger, community-driven. Awesome work by @KarlPertsch @pranav_atreya @tonyh_lee and an incredible crowdsourcing team. Upload and test your model! 🚀

KKarl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

0

3

21

6

2.0K

Pinned

K

Karl Pertsch@KarlPertsch · Jun 20

Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hopefully folks can contribute, and help us take a step towards systematic and standardized empiricism in robot learning! :) Also check out some of the fun sim eval…

KKarl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

0

6

30

6

3.0K

Pinned

K

Karl Pertsch@KarlPertsch · Jun 20

🚀 We just launched RoboArena — a real-world evaluation platform for robot policies! Think Chatbot Arena, but for robotics. 📝 Paper: robo-arena.github.io/assets/roboare… 🌐 Website: robo-arena.github.io Joint work with @pranav_atreya and @KarlPertsch. advised by @percyliang,…

KKarl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

0

15

39

18

15.0K

Pinned

Karl Pertsch Retweeted

L

Lerrel Pinto@LerrelPinto · Jun 13

Final note: It is easier to work on robotics now than any point in the past.

1

5

37

2

3.0K

K

Karl Pertsch@KarlPertsch · Jun 25

I'll give a talk about benchmarking generalist policies today at RSS (4:30p, RTH 526, in the benchmarking workshop)! I will discuss sim eval, auto eval, and distributed real-world eval (ie RoboArena) -- swing by :)

KarlPertsch's tweet image. I'll give a talk about benchmarking generalist policies today at RSS (4:30p, RTH 526, in the benchmarking workshop)!
I will discuss sim eval, auto eval, and distributed real-world eval (ie RoboArena) -- swing by :)

3

12

193

85

13.0K

Karl Pertsch Retweeted

P

Pranav Atreya@pranav_atreya · Jun 20

In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from evals in broader ML. But generalist policies share a problem statement: do any task in any environment. Can generalist capabilities make robot evaluation easier?

5

21

131

74

15.0K