David Fan

@DavidJFan

Facebook AI Research (FAIR) | Video Representations, Self-Supervised Learning | @Princeton Computer Science '19

New York City

Joined June 2013

231Following

578Followers

Pinned

David Fan@DavidJFan · Apr 2

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

DavidJFan's tweet image. Can visual SSL match CLIP on VQA?

Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

459

303

70.0K

David Fan@DavidJFan · Jul 16

Come by the poster if you want recommendations on cool restaurants to try in Vancouver 😃!

CCharlie Hou@hou_char · Jul 16

[#ICML2025] Have you ever wanted to train LLMs on distributed private data but were blocked by model size or privacy constraints 😔? Here’s a solution: Introducing 🌸POPri (Policy Optimization for Private Data)! Poster 🗓️ today at 4:30pm PT, 📍East Exhibition Hall A-B E-1006

392

David Fan@DavidJFan · Jun 13

Congrats @jianyuan_wang and co!!!

VVisual Geometry Group (VGG)@Oxford_VGG · Jun 13

Many Congratulations to @jianyuan_wang, @MinghaoChen23, @n_karaev, Andrea Vedaldi, Christian Rupprecht and @davnov134 for winning the Best Paper Award @CVPR for "VGGT: Visual Geometry Grounded Transformer" 🥇🎉 🙌🙌 #CVPR2025!!!!!!

492

David Fan@DavidJFan · Jun 11

It was a pleasure to work with the team on this! Looking forward to further improving the ability to learn from and predict in the visual world.

VVisual Geometry Group (VGG)@Oxford_VGG · Jun 13

735

David Fan@DavidJFan · May 9

Welcome Rob! So blessed to have you steer the ship! See you around the office :)

RRob Fergus@rob_fergus · May 8

1/ Excited to share that I’m taking on the role of leading Fundamental AI Research (FAIR) at Meta. Huge thanks to Joelle for everything. Look forward to working closely again with Yann & team.

870

David Fan Retweeted

Xindi Wu@cindy_x_wu · May 2

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10

159

51.0K

David Fan@DavidJFan · Apr 15

Excited to release the training code for MetaMorph! MetaMorph offers a simple yet effective way to convert LLMs into a multimodal LLM that not only takes multimodal inputs, but also generates multimodal outputs via AR prediction. This confers the ability to “think visually”, and…

PPeter Tong@TongPetersb · Apr 15

We're open-sourcing the training code for MetaMorph! MetaMorph offers a lightweight framework for turning LLMs into unified multimodal models: (multimodal) tokens -> transformers -> diffusion -> pixel! This is our best take on unified modeling as of November 2024, and…

3.0K

David Fan@DavidJFan · Apr 10

Excited to share that our paper on Navigation World Models was selected for an Oral presentation at CVPR! Code & models: github.com/facebookresear… huggingface.co/facebook/nwm

AAmir Bar@_amirbar · Dec 5

Happy to share our new work on Navigation World Models! 🔥🔥 Navigation is a fundamental skill of agents with visual-motor capabilities. We train a single World Model across multiple environments and diverse agent data. w/ @GaoyueZhou, Danny Tran, @trevordarrell and @ylecun.

104

8.0K

David Fan Retweeted

Zhuang Liu@liuzhuang1234 · Mar 14

New paper - Transformers, but without normalization layers (1/n)

603

4.0K

3.0K

1.3M