Efstathios Karypidis (@K_Sta8is)

Pinned

E

1/n 🚀 Excited to share our latest work: DINO-Foresight, a new framework for predicting the future states of scenes using Vision Foundation Model features! Links to the arXiv and Github 👇

K_Sta8is's tweet image. 1/n 🚀 Excited to share our latest work: DINO-Foresight, a new framework for predicting the future states of scenes using Vision Foundation Model features!
Links to the arXiv and Github 👇

7

57

302

265

37.0K

Pinned

Efstathios Karypidis Retweeted

v

valeo.ai@valeoai · Jun 6

Just back from CVPR@Paris 🇫🇷, what a fantastic event! Great talks, great posters, and great to connect with the French & European vision community. Kudos to the organizers, hoping that it returns next year! 🤞 #CVPR2025 @CVPR

0

4

26

0

2.0K

Efstathios Karypidis Retweeted

D

Dimitris Tzionas@dimtzionas · Jul 21

📢 R u in Athens on July 22? 📢 Check out the #ComputerVision Day @ ArchimedesAI! Talks: 👉@VickyKalogeiton: 'Efficient Brains that Imagine' 👉Dimitris Samaras: 'From Saliency to Scanpaths: 20 years of Wandering Eyes' 👉@dimtzionas: 'Towards In-the-Wild Understanding of 3D…

0

7

18

1

892

Efstathios Karypidis Retweeted

K

Kosta Derpanis@CSProfKGD · Jul 14

via @ArashVahdat

2

16

152

79

13.0K

E

Efstathios Karypidis@K_Sta8is · Jul 4

Interesting alternative to multi-token prediction, though the figure is a bit unintuitive. Instead of attaching a head for each +d'th prediction, pass a dummy input token for each extra prediction through the model. This is A LOT more expensive, e.g. doing 2-step prediction…

AAnastasios Gerontopoulos@NasosGer · May 20

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

15

28

416

388

59.0K

E

Efstathios Karypidis@K_Sta8is · Jul 2

Nice trick for fine-tuning with multi-token prediction without architecture changes: interleave learnable register tokens into the input sequence & discard them at inference. It works for supervised fine-tuning, PEFT, pretraining, on both language and vision domains 👇

AAnastasios Gerontopoulos@NasosGer · May 20

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

0

3

12

4

1.0K

Efstathios Karypidis Retweeted

S

Shashank@shawshank_v · Jun 26

New paper out - accepted at @ICCVConference We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵

2

24

130

65

17.0K

Efstathios Karypidis Retweeted

S

Sophia Sirko-Galouchenko@sophia_sirko · Jun 25

1/n 🚀New paper out - accepted at @ICCVConference! Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!

2

25

119

48

12.0K

E

Efstathios Karypidis@K_Sta8is · Jun 6

Achievement unlocked: having Alyosha at our FUNGI poster, the one person I had in mind when working on this paper on cheap and better representations for k-nn classification and not only #cvprinparis #cvpr2025

AAndrei Bursuc@abursuc · Nov 12

Self-supervised learning is fantastic for pretraining, but can we use it for other tasks (kNN classification, in-context learning) & modalities, w/o training & by simply using its gradients as features? Enter 🍄FUNGI - Features from UNsupervised GradIents #NeurIPS2024 🧵

1

6

61

14

6.0K

Efstathios Karypidis Retweeted

B

Bin Lin@LinBin46984 · Jun 3

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

4

33

188

126

22.0K

E

Efstathios Karypidis@K_Sta8is · May 22

Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons x.com/NasosGer/statu…

AAnastasios Gerontopoulos@NasosGer · May 20

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

0

5

9

4

904

Efstathios Karypidis Retweeted

A

Anastasios Gerontopoulos@NasosGer · May 20

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

3

16

137

120

66.0K

E

Efstathios Karypidis@K_Sta8is · May 2

EQ-VAE is accepted at #ICML2025 😁. Grateful to my co-authors for their guidance and collaboration! @IoannisKakogeo1, @SpyrosGidaris, Nikos Komodakis.

TThodoris Kouzelis@ThKouz · Feb 18

1/n🚀If you’re working on generative image modeling, check out our latest work! We introduce EQ-VAE, a simple yet powerful regularization approach that makes latent representations equivariant to spatial transformations, leading to smoother latents and better generative models.👇

0

4

25

2

1.0K

Efstathios Karypidis Retweeted

H

Harry Thasarathan@HThasarathan · Feb 7

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)

5

99

374

262

71.0K

Efstathios Karypidis Retweeted

T

Thodoris Kouzelis@ThKouz · Apr 25

1/n Introducing ReDi (Representation Diffusion): a new generative approach that leverages a diffusion model to jointly capture – Low-level image details (via VAE latents) – High-level semantic features (via DINOv2)🧵

3

53

345

287

63.0K

Efstathios Karypidis Retweeted

R

Rudy Gilman@rgilman33 · Apr 14

The sdxl-VAE models a substantial amount of noise. Things we can't even see. It meticulously encodes the noise, uses precious bottleneck capacity to store it, then faithfully reconstructs it in the decoder. I grabbed what I thought was a simple black vector circle on a white…

20

42

416

239

51.0K

Efstathios Karypidis Retweeted

K

Kosta Derpanis@CSProfKGD · Mar 28

Made with Sora Input: KITTI image Prompt 1: “Make this into a semantic segmentation map” Prompt 2: “Make this into a depth map”

9

18

189

49

28.0K