Hoàng Anh Just
@reds_tiger
TIGER@REDS http://justhoanganh.com
Thrilled to receive ICLR'25 Outstanding Paper Honorable Mention! Heartfelt thanks to my incredible mentors @ruoxijia @dawnsongtweets @prateekmittal_ @james_y_zou. I'll be giving two oral presentations on our recent work in training data attribution & co-organizing a workshop on…
Submission deadline AoE today for our Workshop on Data Problems for Foundation Models! Look forward to your contributions!
Announcing the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! We welcome submissions exploring ALL ASPECTS OF DATA in foundation model research. Submission deadline: Feb 7th (this Friday!) 11:59pm AoE datafm.github.io
Announcing the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! We welcome submissions exploring ALL ASPECTS OF DATA in foundation model research. Submission deadline: Feb 7th (this Friday!) 11:59pm AoE datafm.github.io

Excited to announce the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)!
Excited to announce the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! We welcome submissions exploring all aspects of data in foundation model research, including but not limited to data curation, attribution, copyright, synthetic data, benchmark, societal…
Excited to announce the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM)! We welcome submissions exploring all aspects of data in foundation model research, including but not limited to data curation, attribution, copyright, synthetic data, benchmark, societal…
Excited to attend #NeurIPS2024 in Vancouver 🇨🇦! I will be presenting our work: "Boosting Alignment for Post-Unlearning Text-to-Image Generative Models." If you are interested in unlearning, stop by our poster: 🕐 Wed, Dec 11 | 11 am - 2 pm PST 📍 West Ballroom #7006
Join us Tuesday at 1:30 PM PT at #NeurIPS2024 for our tutorial on data selection for foundation models! With @lschmidt3 & @JiachenWang97, we'll cover principled experimentation, selection algorithms, a unified theoretical framework, and open challenges. Hope to see you there!
Excited to share our latest work that provides a general recipe for advancing SOTA for preference alignment (e.g., DPO) with a simple yet effective data-centric approach—augmenting the preference dataset with rationales! Definitely check it out before aligning your next model:…
🌟Data-Centric Human Preference Optimization with Rationales🌟 📜arxiv.org/abs/2407.14477 In standard preference learning framework, the model is provided with ranked/paired responses to align with human preferences. How can we better help models grasp these preferences? 1/N🧵
4/@feiyang_ml @reds_tiger @ruoxijia et. al. presented a method to "warm up" a pre-trained model before fine-tuning. They solve an optimal transport problem to nudge the pre-training distribution closer to the target by training on relevant data subsets. arxiv.org/abs/2405.02774
Greetings from Vienna🎻! #ICLR2024 Fine-tuning LLMs: *distributional matching* selects mostly redundant💸 samples already well-trained during pre-training We select most🚀 helpful samples with OT gradients; million scale in minutes⚡! Poster Halle B #140 Tues 10:45 a.m (1/3)
An example of how DUSt3R can do "impossible matching": given two images without any shared visual content (my office, obviously never seen at training), it can output an accurate reconstruction (no intrinsics, no poses!) in seconds