joao carreira

@joaocarreira

Research Scientist at Google DeepMind

London, England

Joined February 2009

294Following

1KFollowers

Pinned

joao carreira@joaocarreira · Jul 10

Scaling 4D Representations – new preprint arxiv.org/abs/2412.15212 and models now available github.com/google-deepmin…

joaocarreira's tweet card. Contribute to google-deepmind/representations4d development by creating an account on GitHub.

203

136

15.0K

joao carreira@joaocarreira · Jul 15

3rd edition of the challenge with new exciting tasks and guest tracks; back during covid when we had the first workshop about the perception test (computerperception.github.io) some of us were afraid the benchmark was too difficult; now we just made it harder.

NNikhil Parthasarathy@nikparth1 · Jul 15

The 3rd Perception Test challenge is now accepting submissions perception-test-challenge.github.io ! Prizes of up to 50k EUR across Perception Test tracks are available. The winners will be announced at the Perception Test workshop at #ICCV2025. Submission deadline: October 6, 2025.

373

joao carreira Retweeted

Yana Hasson@yanahasson · Jul 8

Thrilled to share our latest work on SciVid, to appear at #ICCV2025! 🎉 SciVid offers cross-domain evaluation of video models in scientific applications, including medical CV, animal behavior, & weather forecasting 🧪🌍📽️🪰🐭🫀🌦️ #AI4Science #FoundationModel #CV4Science [1/5]🧵

2.0K

joao carreira Retweeted

Sangwoo Mo@sangwoomo · Jun 16

Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 sp4v.github.io

21.0K

joao carreira@joaocarreira · May 2

Individual frames out of generative video models tend to look reasonable; capturing actions happening over time realistically ... that is way harder. TRAJAN is a new evaluation procedure to better guide progress in this (hot) area.

KKelsey Allen@KelseyRAllen · May 2

Humans can tell the difference between a realistic generated video and an unrealistic one – can models? Excited to share TRAJAN: the world’s first point TRAJectory AutoeNcoder for evaluating motion realism in generated and corrupted videos. 🌐 trajan-paper.github.io 🧵

7.0K

joao carreira Retweeted

Tengda Han@TengdaHan · Apr 9

Check out our CVPR 2025 paper: arxiv.org/abs/2504.01961. Work with Dilara Gokay, Joseph Heyward, @ChuhanZhang5 , @DanielZoran_ , Viorica Pătrăucean, @joaocarreira , @dimadamen and Andrew Zisserman, @GoogleDeepMind

7.0K

joao carreira Retweeted

Tengda Han@TengdaHan · Mar 5

We are looking for a student researcher to work on video understanding plus 3D, in Google DeepMind London. DM/Email me or pass it to someone if you feel it may be a good fit!

119

18.0K

joao carreira Retweeted

EEML@EEMLcommunity · Feb 5

Apply here: eeml.eu/application Confirmed speakers: @AaronCourville @AldenHung @dianaborsa @09Emmar @joaocarreira @MihaelaCRosca @senka_snow @fedzbar @bose_joey @LiliMomeni @Miruna_Pislar Razvan Pascanu Samy Bengio

2.0K

joao carreira Retweeted

Sjoerd van Steenkiste@vansteenkiste_s · Nov 13

Excited to announce MooG for learning video representations. MooG allows tokens to move “off-the-grid” enabling better representation of scene elements, even as they move across the image plane through time. 📜arxiv.org/abs/2411.05927 🌐moog-paper.github.io

4.0K

joao carreira Retweeted

Dima Damen@dimadamen · Aug 20

Time to challenge VLMs? Fed up of benchmarks claiming long-video reasoning but only need few seconds? Try out Hour-Long VQA PerceptionTest Challenge @eccvconf by @GoogleDeepMind Q. How many dogs did the person encounter in 1-hour long walking video? youtu.be/kefMfeuBRsk

4.0K

joao carreira Retweeted

Skanda@skandakoppula · Jul 9, 2024

We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point trajectories, for the task of Tracking Any Point in 3D!

291

138

44.0K

joao carreira Retweeted

Shiry Ginosar@shiryginosar · Jun 21, 2024

Join us next week at our second (high-level) intelligence workshop @SimonsInstitute! Schedule: simons.berkeley.edu/workshops/unde… Register online for both in-person and streaming. Yet another FANTASTIC lineup of speakers:

9.0K

joao carreira@joaocarreira · Jun 12, 2024

The 2nd Perception Test Challenge is now on -- with a workshop happening in ECCV Milano later in the year. See all about it here ptchallenge-workshop.github.io and try out your top general perception models on it. Besides the original 6 tasks we'll have a new hour-long videoQA track.

811

joao carreira Retweeted

Carl Doersch@CarlDoersch · May 23, 2024

We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github. bootstap.github.io

317

129

50.0K

joao carreira Retweeted

Shashank@shawshank_v · Apr 16, 2024

Delighted to host the 1st edition of our tutorial "Time is precious: Self-Supervised Learning Beyond Images" at @eccvconf with @MrzSalehi and @y_m_asano. We have an exciting line of speakers too @joaocarreira, @imisra_ and Emin Orhan. More details coming soon...#ECCV2024

7.0K

joao carreira Retweeted

Shane Legg@ShaneLegg · Mar 13, 2024

Our research project SIMA is creating a general, natural language instructable, multi 3D game-playing AI agent. The agent can carry out a wide range of tasks in virtual worlds, making AI more adaptable, helpful & fun! dpmd.ai/sima-1

486

100

340.0K

joao carreira@joaocarreira · Jan 19, 2024

Videos have a wealth of learning signal that is still underappreciated -- in fact, looks like a single long video can be as valuable as a large curated internet image dataset. Cool work from @shawshank_v et al with a new self-sup formulation where multi-object tracking emerges.

SShashank@shawshank_v · Jan 19, 2024

Really happy to share that DoRA is accepted as an Oral to @iclr_conf #ICLR2024 Using just “1 video” from our new egocentric dataset - Walking Tours, we develop a new method that outperforms DINO pretrained on ImageNet on image and video downstream tasks. More details in 🧵👇

7.0K