Cheng-Yu Hsieh (@cydhsieh)

Pinned

C

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! 📢: @HPouransari will be presenting…

cydhsieh's tweet image. Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions!

📢: @HPouransari will be presenting…

1

8

29

9

3.0K

Pinned

C

Cheng-Yu Hsieh@cydhsieh · Feb 27

I'm exited to announce that our work (AURORA) got accepted into #CVPR2025🎉! Special thanks to my coauthors: @ch1m1m0ry0, @cydhsieh, @ethnlshn, @Dongping0612, Linda Shapiro and @RanjayKrishna, This work wouldn’t have been possible without them! See you all in Nashville 🎸!

MMahtab Bigverdi@MahtabBg · Dec 9

Introducing AURORA 🌟: Our new training framework to enhance multimodal language models with Perception Tokens; a game-changer for tasks requiring deep visual reasoning like relative depth estimation and object counting. Let’s take a closer look at how it works.🧵[1/8]

4

38

2

4.0K

Cheng-Yu Hsieh Retweeted

J

Jae Sung Park@jjaesungPark · Jun 12

🔥We are excited to present our work Synthetic Visual Genome (SVG) at #CVPR25 tomorrow! 🕸️ Dense scene graph with diverse relationship types. 🎯 Generate scene graphs with SAM segmentation masks! 🔗Project link: bit.ly/4e1uMDm 📍 Poster: #32689, Fri 2-4 PM 👇🧵

2

8

19

4

5.0K

Cheng-Yu Hsieh Retweeted

A

Alex Ratner@ajratner · May 29

Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---…

15

75

841

38

45.0K

Cheng-Yu Hsieh Retweeted

P

Peter@PeterSushko · Apr 29

1/8🧵 Thrilled to announce RealEdit (to appear in CVPR 2025)! We introduce a real-world image-editing dataset sourced from Reddit. Along with the training and evaluation datasets, we release our model that achieves SOTA performances on a variety of real-world editing tasks.

3

8

55

22

11.0K

C

Cheng-Yu Hsieh@cydhsieh · Apr 24

Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens): -…

JJason Ramapuram@jramapuram · Jan 19

Small update on SigmoidAttn (arXiV incoming). - 1B and 7B LLM results added and stabilized. - Hybrid Norm [on embed dim, not seq dim], `x + norm(sigmoid(QK^T / sqrt(d_{qk}))V)`, stablizes longer sequence (n=4096) and larger models (7B). H-norm used with Grok-1 for example.

1

14

45

19

9.0K

Cheng-Yu Hsieh Retweeted

J

Jieyu Zhang @ CVPR@JieyuZhang20 · Mar 14

The 2nd Synthetic Data for Computer Vision workshop at @CVPR! We had a wonderful time last year, and we want to build on that success by fostering fresh insights into synthetic data for CV. Join us! We welcome submissions! Please consider submitting your work! (deadline: March…

3

9

25

1

9.0K

Cheng-Yu Hsieh Retweeted

Y

Yung-Sung Chuang@YungSungChuang · Feb 14

(1/5)🚨LLMs can now self-improve to generate better citations✅ 📝We design automatic rewards to assess citation quality 🤖Enable BoN/SimPO w/o external supervision 📈Perform close to “Claude Citations” API w/ only 8B model 📄arxiv.org/abs/2502.09604 🧑‍💻github.com/voidism/SelfCi…

12

76

310

193

38.0K

Cheng-Yu Hsieh Retweeted

M

Mahtab Bigverdi@MahtabBg · Dec 9

Introducing AURORA 🌟: Our new training framework to enhance multimodal language models with Perception Tokens; a game-changer for tasks requiring deep visual reasoning like relative depth estimation and object counting. Let’s take a closer look at how it works.🧵[1/8]

1

9

33

7

7.0K

C

Cheng-Yu Hsieh@cydhsieh · Nov 11

I will be presenting our Lookback Lens paper at #EMNLP2024 in Miami! 📆 Nov 13 (Wed) 4:00-5:30 at Tuttle (Oral session: ML for NLP 1) 🔗 arxiv.org/abs/2407.07071 Happy to chat about LLMs and hallucinations! See you soon in Miami! ✈️ @linluqiu @cydhsieh @RanjayKrishna @yoonrkim

YYung-Sung Chuang@YungSungChuang · Jul 10, 2024

🚨Can we "internally" detect if LLMs are hallucinating facts not present in the input documents? 🤔 Our findings: - 👀Lookback ratio—the extent to which LLMs put attention weights on context versus their own generated tokens—plays a key role - 🔍We propose a hallucination…

0

5

30

9

2.0K

Cheng-Yu Hsieh Retweeted

A

Amita Kamath@kamath_amita · Sep 30

Hard negative finetuning can actually HURT compositionality, because it teaches VLMs THAT caption perturbations change meaning, not WHEN they change meaning! 📢 A new benchmark+VLM at #ECCV2024 in The Hard Positive Truth arxiv.org/abs/2409.17958 @cydhsieh @RanjayKrishna @uclanlp

2

10

42

15

3.0K

C

Cheng-Yu Hsieh@cydhsieh · Jul 10, 2024

🤔 In training vision models, what value do AI-generated synthetic images provide compared to the upstream (real) data used in training the generative models in the first place? 💡 We find using "relevant" upstream real data still leads to much stronger results compared to using…

SScott Geng@scottgeng00 · Jul 10, 2024

Will training on AI-generated synthetic data lead to the next frontier of vision models?🤔 Our new paper suggests NO—for now. Synthetic data doesn't magically enable generalization beyond the generator's original training set. 📜: arxiv.org/abs/2406.05184 Details below🧵(1/n)

0

3

10

4

1.0K

C

Cheng-Yu Hsieh@cydhsieh · Jul 10, 2024

‼️ LLMs hallucinate facts even if provided with correct/relevant contexts 💡 We find models' attention weight distribution on input context versus their own generated tokens serves as a strong detector for such hallucinations 🚀 The detector transfers across models/tasks, and can…

YYung-Sung Chuang@YungSungChuang · Jul 10, 2024

🚨Can we "internally" detect if LLMs are hallucinating facts not present in the input documents? 🤔 Our findings: - 👀Lookback ratio—the extent to which LLMs put attention weights on context versus their own generated tokens—plays a key role - 🔍We propose a hallucination…

0

6

36

27

6.0K