Fenil Doshi

@fenildoshi009

PhD student @Harvard and @KempnerInst studying biological and machine vision | object perception | mid-level vision | cortical organization

Cambridge, MA

Joined June 2017

2KFollowing

559Followers

Pinned

Fenil Doshi@fenildoshi009 · Jul 2

🧵 What if two images have the same local parts but represent different global shapes purely through part arrangement? Humans can spot the difference instantly! The question is can vision models do the same? 1/15

114

588

420

60.0K

Pinned

Fenil Doshi Retweeted

Andrew Lampinen@AndrewLampinen · Jul 21

Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:

529

433

85.0K

Fenil Doshi Retweeted

Mihir Prabhudesai@mihirp98 · 14 h

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

110

604

525

80.0K

Fenil Doshi Retweeted

Andy Keller@t_andy_keller · 16 h

Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵

266

179

15.0K

Fenil Doshi Retweeted

Kempner Institute at Harvard University@KempnerInst · 20 h

New in the #DeeperLearningBlog: #KempnerInstitute research fellow @t_andy_keller introduces the first flow equivariant neural networks, which reflect motion symmetries, greatly enhancing generalization and sequence modeling. bit.ly/451fQ48 #AI #NeuroAI

856

Fenil Doshi@fenildoshi009 · Jul 20

Great excuse to share something I really love: 1-Lipschitz nets. They give clean theory, certs for robustness, the right loss for W-GANs, even nicer grads for explainability!! Yet are still niche. Here’s a speed-run through some of my favorite papers on the field. 🧵👇

ccider@jeffreycider · Jul 20

optimization theorem: "assume a lipschitz constant L..." the lipschitz constant:

410

408

53.0K

Fenil Doshi Retweeted

Demis Hassabis@demishassabis · Jul 21

Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…

199

750

6.0K

623

1.4M

Fenil Doshi Retweeted

Shashank@shawshank_v · Jul 21

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵

251

194

38.0K

Fenil Doshi Retweeted

Seungwoo (Simon) Kim@SeKim1112 · Jul 15

We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_…

4.0K

Fenil Doshi Retweeted

Tiago Pimentel@tpimentelms · Jul 14

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

204

174

16.0K

Fenil Doshi@fenildoshi009 · Jul 12

Submit to our workshop on contextualizing Cogsci approaches for understanding neural networks---"Cognitive interpretability"!

CCogInterp Workshop @ NeurIPS 2025@CogInterp · Jul 11

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/

2.0K

Fenil Doshi@fenildoshi009 · Jul 12

🧠 Submit to CogInterp @ NeurIPS 2025! Bridging AI & cognitive science to understand how models think, reason & represent. CFP + details 👉 coginterp.github.io/neurips2025/

CCogInterp Workshop @ NeurIPS 2025@CogInterp · Jul 11

2.0K

Fenil Doshi Retweeted

Ryo Kamoi@RyoKamoi · Dec 4

📢 New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet... We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n] arxiv.org/abs/2412.00947 github.com/psunlpgroup/Vi…

106

27.0K

Fenil Doshi Retweeted

Randall Balestriero@randall_balestr · Jul 8

Impressed by DINOv2 perf. but don't want to spend too much $$$ on compute and wait for days to pretrain on your own data? Say no more! Data augmentation curriculum speeds up SSL pretraining (as it did for generative and supervised learning) -> FastDINOv2! arxiv.org/abs/2507.03779

188

122

14.0K

Fenil Doshi Retweeted

Tim Kietzmann@TimKietzmann · Jul 8

Exciting new preprint from the lab: “Adopting a human developmental visual diet yields robust, shape-based AI vision”. A most wonderful case where brain inspiration massively improved AI solutions. Work with @lu_zejin @martisamuser and Radoslaw Cichy arxiv.org/abs/2507.03168

141

15.0K

Fenil Doshi Retweeted

Damien Teney@DamienTeney · Jul 7

Coming up at ICML: 🤯Distribution shifts are still a huge challenge in ML. There's already a ton of algorithms to address specific conditions. So what if the challenge was just selecting the right algorithm for the right conditions?🤔🧵

364

263

31.0K

Fenil Doshi Retweeted

Amir Zamir@zamir_ar · Jul 6

We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress;…

551

438

67.0K

Fenil Doshi@fenildoshi009 · Jun 30

Updated paper! Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm outliers and attention sinks in ViTs, while preserving zero-shot ImageNet performance. Maybe ViTs don’t need registers after all? x.com/nickhjiang/sta…

NNick Jiang@nickhjiang · Jun 10

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

189

103

18.0K

Fenil Doshi Retweeted

Ekdeep Singh@EkdeepL · Jun 28

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

344

345

60.0K

Fenil Doshi@fenildoshi009 · Jul 3

Cool work uses "visual anagrams": two images of different objects made out of the same image patches. Model must classify both correctly to score. Hence, higher scoring models use global geometry, lower use textures. SigLIP is GOAT of course, or I wouldn't repost this (jk)

FFenil Doshi@fenildoshi009 · Jul 2

We find self-supervised & language-aligned ViTs scored highest on CSS — even matching humans. Supervised models are not even close. Surprisingly, high ImageNet accuracy does not guarantee high configural shape score! 5/15

155

21.0K

Fenil Doshi@fenildoshi009 · Jul 3

Beautiful work (and thread)! They revisit the well-known shape-vs-texture bias, this time with objects made of the same subparts. With ablations they confirm the intuition that long-range attention mechanisms are essential for transformers to "see" the global shape in the picture

FFenil Doshi@fenildoshi009 · Jul 2

3.0K