DSE Lab @ MSU
@dse_msu
Data Science and Engineering Lab Director: Dr. Jiliang Tang (@tangjiliang)
✨ Excited to share our new preprint "Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis"! arxiv.org/abs/2406.10794 🔍 We delve into why some jailbreak attacks succeed by exploring harmful and harmless prompts in the LLM's representation space.
📢Call for Papers: LLM for E-Commerce Workshop @ WWW'25 📅April 28-29, 2025 | Sydney, Australia 🌍 Explore how LLMs are transforming e-commerce: foundations, applications & systems. 📝Submit: openreview.net/group?id=ACM.o… (by Jan 26, 2025 AoE) 👉Details: llm4ecommerce.github.io
🎯 Detect & Filter RAG Contexts with LLM Representations Excited to share our work on Representation-based knowledge checking in #RAG! arxiv.org/abs/2411.14572 We show how LLM representations detect & filter misleading/unhelpful knowledge and improve performance.
📢 I'm on the faculty job market this year! My past research has focused on Graph ML, with an emphasis on link prediction (LP) and knowledge graph reasoning (KGR). I do this from 3 directions (see 🧵): (1/3)
🚀 Excited to share our latest research on enhancing privacy in RAG systems! arxiv.org/pdf/2406.14773 Our paper introduces SAGE, a novel approach using synthetic data to protect sensitive information while maintaining high utility. #AI #Privacy #MachineLearning #RAG #DataSecurity
2) "Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion" a joint project with Jay Revolinsky and @jiliang. arxiv.org/abs/2406.11898
1) "Understanding the Generalizability of Link Predictors Under Distribution Shifts on Graphs" a joint projectled led by co-first author Jay Revolinsky and @tangjiliang arxiv.org/abs/2406.08788…
🙋♂️What is the wildest dream for graph foundation models? 🎯Graph across domains → a single model → all the downstream 🙋♀️Can we achieve that? ✅Yes! UniAug: Cross-Domain Graph Data Scaling with Diffusion Models 📃arxiv.org/pdf/2406.01899
arxiv.org/pdf/2402.02212… "A Data Generation Perspective to the Mechanism of In-Context Learning". We investigate the mechanism of In-context learning which helps ground debate on whether LLM can achieve intelligence to whether LLM can learn new data generation function in context
Exciting News! Our new paper on memorization in text-to-image diffusion is now available. We delve into the understanding of memorization via attention, and throw a light on the internal model behavior when memorization happens. Please find our paper at arxiv.org/abs/2403.11052
Our paper for LLM watermark is accepeted by NAACL findings! We proposed a new method to strengthen the robustness of watermark agains paraphrase using the semantics. This is very meaningful factor for the practical application! Please find the paper at openreview.net/forum?id=hbMR7…
Exciting News! Our DANCE version 1, "DANCE: a deep learning library and benchmark platform for single-cell analysis" is now finally published in Genome Biology (@GenomeBiology ) 🎉 !!! DANCE has impacted the field, and got 290+ GitHub stars 🌟 before its official publication!
With the imaging-based spatial transcriptomics such as MERFISH, seqFISH, CosMx SMI, Xenium and others, have you ever wondered how we can leverage their subcellular spatial information? Check out our latest preprint on Focus by Qiaolin and Jiayuan @JiayuanDing , two talent…
🔒💡 Excited to share our latest #RAG #Privacy research! We've uncovered two pivotal aspects: 1️⃣ Privacy challenges within RAG's own data 2️⃣ RAG's potential to safeguard training data 🔍 Discover the dual-edged sword of RAG technology in our paper arxiv.org/pdf/2402.16893
See our new repo github.com/CurryTang/Towa… including (1) theoretical guidance (2) existing benchmark datasets and (3) existing GFM summarization. The new seminar focusing on GFM will be on board soon!!!