Shruti Joshi
@_shruti_joshi_
phd student in identifiable repl @Mila_Quebec. prev. research programmer @MPI_IS Tübingen, undergrad @IITKanpur '19.
1\ Hi, can I get an unsupervised sparse autoencoder for steering, please? I only have unlabeled data varying across multiple unknown concepts. Oh, and make sure it learns the same features each time! Yes! A freshly brewed Sparse Shift Autoencoder (SSAE) coming right up. 🧶

I am thrilled to announce that I will be joining the Gatsby Computational Neuroscience Unit at UCL as a Lecturer (Assistant Professor) in Feb 2025! Looking forward to working with the exceptional talent at @GatsbyUCL on cutting-edge problems in deep learning and causality.
We are delighted to announce that Dr Leena Chennuru Vankadara will join the Unit as Lecturer in Feb 2025, developing theoretical understandings of scaling and generalization in deep learning and causality. Welcome aboard @leenaCvankadara! Learn more at ucl.ac.uk/gatsby/news-an…
🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856
⚡⚡ Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model 🧵👇
𝐓𝐡𝐨𝐮𝐠𝐡𝐭𝐨𝐥𝐨𝐠𝐲 paper is out! 🔥🐋 We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! 🌐: mcgill-nlp.github.io/thoughtology/
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour. 🔗: mcgill-nlp.github.io/thoughtology/
Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨ Work w/ fantastic advisors @DBahdanau and @sivareddyg Thread 🧵:
📣 📣 📣 Our new paper investigates the question of how many images 🖼️ of a concept are required by a diffusion model 🤖 to imitate it. This question is critical for understanding and mitigating the copyright and privacy infringements of these models! arxiv.org/abs/2410.15002
🚨NEW PAPER OUT 🚨 Excited to share our latest research initiative on in-context learning and meta-learning through the lens of Information theory !🧠 🔗 arxiv.org/abs/2410.14086 Check out our insights and empirical experiments! 🔍
Introducing our new paper explaining in-context learning through the lens of Occam’s razor, giving a normative account of next-token prediction objectives. This was with @Tom__Marty @tejaskasetty @le0gagn0n @sarthmit @MahanFathi @dhanya_sridhar @g_lajoie_ arxiv.org/abs/2410.14086
Presenting tomorrow at #NAACL2024: 𝐶𝑎𝑛 𝐿𝐿𝑀𝑠 𝑖𝑛-𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑙𝑒𝑎𝑟𝑛 𝑡𝑜 𝑢𝑠𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔 𝑙𝑖𝑏𝑟𝑎𝑟𝑖𝑒𝑠 𝑎𝑛𝑑 𝑙𝑎𝑛𝑔𝑢𝑎𝑔𝑒𝑠? 𝑌𝑒𝑠. 𝐾𝑖𝑛𝑑 𝑜𝑓. Internship @allen_ai work with @pdasigi and my advisors @DBahdanau and @sivareddyg.
Adversarial Triggers For LLMs Are 𝗡𝗢𝗧 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹!😲 It is believed that adversarial triggers that jailbreak a model transfer universally to other models. But we show triggers don't reliably transfer, especially to RLHF/DPO models. Paper: arxiv.org/abs/2404.16020
📢 Exciting new work on AI safety! Do adversarial triggers transfer universally across models (as has been claimed)? 𝗡𝗼. Are models aligned by supervised fine-tuning safe against adversarial triggers? 𝗡𝗼. RLHF and DPO are far better!
Adversarial Triggers For LLMs Are 𝗡𝗢𝗧 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹!😲 It is believed that adversarial triggers that jailbreak a model transfer universally to other models. But we show triggers don't reliably transfer, especially to RLHF/DPO models. Paper: arxiv.org/abs/2404.16020
Presenting tomorrow at #EMNLP2023: MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations w/ amazing advisors and collaborators @DBahdanau, @sivareddyg, and @satwik1729
1/ Excited for our oral presentation at #NeurIPS2023 on "Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation"! A theoretical paper about object-centric representation learning (OCRL), disentanglement & extrapolation arxiv.org/abs/2307.02598