joao carreira
@joaocarreira
Research Scientist at Google DeepMind
Scaling 4D Representations – new preprint arxiv.org/abs/2412.15212 and models now available github.com/google-deepmin…
3rd edition of the challenge with new exciting tasks and guest tracks; back during covid when we had the first workshop about the perception test (computerperception.github.io) some of us were afraid the benchmark was too difficult; now we just made it harder.
The 3rd Perception Test challenge is now accepting submissions perception-test-challenge.github.io ! Prizes of up to 50k EUR across Perception Test tracks are available. The winners will be announced at the Perception Test workshop at #ICCV2025. Submission deadline: October 6, 2025.
Thrilled to share our latest work on SciVid, to appear at #ICCV2025! 🎉 SciVid offers cross-domain evaluation of video models in scientific applications, including medical CV, animal behavior, & weather forecasting 🧪🌍📽️🪰🐭🫀🌦️ #AI4Science #FoundationModel #CV4Science [1/5]🧵
Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 sp4v.github.io
Individual frames out of generative video models tend to look reasonable; capturing actions happening over time realistically ... that is way harder. TRAJAN is a new evaluation procedure to better guide progress in this (hot) area.
Humans can tell the difference between a realistic generated video and an unrealistic one – can models? Excited to share TRAJAN: the world’s first point TRAJectory AutoeNcoder for evaluating motion realism in generated and corrupted videos. 🌐 trajan-paper.github.io 🧵
Check out our CVPR 2025 paper: arxiv.org/abs/2504.01961. Work with Dilara Gokay, Joseph Heyward, @ChuhanZhang5 , @DanielZoran_ , Viorica Pătrăucean, @joaocarreira , @dimadamen and Andrew Zisserman, @GoogleDeepMind
We are looking for a student researcher to work on video understanding plus 3D, in Google DeepMind London. DM/Email me or pass it to someone if you feel it may be a good fit!
Apply here: eeml.eu/application Confirmed speakers: @AaronCourville @AldenHung @dianaborsa @09Emmar @joaocarreira @MihaelaCRosca @senka_snow @fedzbar @bose_joey @LiliMomeni @Miruna_Pislar Razvan Pascanu Samy Bengio
Excited to announce MooG for learning video representations. MooG allows tokens to move “off-the-grid” enabling better representation of scene elements, even as they move across the image plane through time. 📜arxiv.org/abs/2411.05927 🌐moog-paper.github.io
Time to challenge VLMs? Fed up of benchmarks claiming long-video reasoning but only need few seconds? Try out Hour-Long VQA PerceptionTest Challenge @eccvconf by @GoogleDeepMind Q. How many dogs did the person encounter in 1-hour long walking video? youtu.be/kefMfeuBRsk
We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point trajectories, for the task of Tracking Any Point in 3D!
Join us next week at our second (high-level) intelligence workshop @SimonsInstitute! Schedule: simons.berkeley.edu/workshops/unde… Register online for both in-person and streaming. Yet another FANTASTIC lineup of speakers:
The 2nd Perception Test Challenge is now on -- with a workshop happening in ECCV Milano later in the year. See all about it here ptchallenge-workshop.github.io and try out your top general perception models on it. Besides the original 6 tasks we'll have a new hour-long videoQA track.
We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github. bootstap.github.io
Delighted to host the 1st edition of our tutorial "Time is precious: Self-Supervised Learning Beyond Images" at @eccvconf with @MrzSalehi and @y_m_asano. We have an exciting line of speakers too @joaocarreira, @imisra_ and Emin Orhan. More details coming soon...#ECCV2024
Our research project SIMA is creating a general, natural language instructable, multi 3D game-playing AI agent. The agent can carry out a wide range of tasks in virtual worlds, making AI more adaptable, helpful & fun! dpmd.ai/sima-1
Videos have a wealth of learning signal that is still underappreciated -- in fact, looks like a single long video can be as valuable as a large curated internet image dataset. Cool work from @shawshank_v et al with a new self-sup formulation where multi-object tracking emerges.
Really happy to share that DoRA is accepted as an Oral to @iclr_conf #ICLR2024 Using just “1 video” from our new egocentric dataset - Walking Tours, we develop a new method that outperforms DINO pretrained on ImageNet on image and video downstream tasks. More details in 🧵👇