Carl Doersch
@CarlDoersch
Researcher at DeepMind
We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github. bootstap.github.io
Humans can tell the difference between a realistic generated video and an unrealistic one – can models? Excited to share TRAJAN: the world’s first point TRAJectory AutoeNcoder for evaluating motion realism in generated and corrupted videos. 🌐 trajan-paper.github.io 🧵
What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!
Want a robot to solve a task, specified in language? Generate a video of a person doing it, and then retarget the action to the robot with the help of point tracking! Cool collab with @mangahomanga during his student researcher stint at Google.
Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset! homangab.github.io/gen2act/ 1/n
Want to make a difference with point tracking? The medical community needs help tracking tissue deformation during surgery! Participate in the STIR challenge (stir-challenge.github.io) at MICCAI, deadline in September.
We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point trajectories, for the task of Tracking Any Point in 3D!
Can you win 2nd Perception Test Challenge? @eccvconf workshop: ptchallenge-workshop.github.io Diagnose Audio-visual MLM on ability to model memory, physics, abstraction &semantics through 6 tasks: VQA, Point Tracking, Box T, action/sound localisation - Jointly! @GoogleDeepMind +win 💰
Just in time for CVPR, we've released code to generate "rainbow visualizations" from a set of point tracks: it semi-automatically segments foreground objects and corrects for camera motion. Try our colab demo at colab.sandbox.google.com/github/deepmin… (vid source youtube.com/watch?v=yuQFQ8…)
📢 Perception Test @ICCVConference now w/ Test Set. We invite submissions to 1st Perception Test- winners announced #ICCV2023 in Paris 6 leaderboards to test Multimodal Model's ultimate perception capabilities. Workshop: ptchallenge-workshop.github.io GitHub: github.com/deepmind/perce… 1/5