Chulin Xie
@ChulinXie
CS PhD student at UIUC; IBM PhD fellow; prev. intern @GoogleAI @MSFTResearch @NvidiaAI
Super excited about our ICLR workshop on synthetic data: "Will Synthetic Data Finally Solve the Data Access Problem?" Submit your work by Feb 5 AoE and join us in Singapore 🎉: synthetic-data-iclr.github.io #ICLR2025 #SyntheticData
🚨 We’ve trained on the entire web—now what? 🚨 Synthetic data holds promise as the next frontier but with caveats. Join us at the ICLR'25 workshop, "Will Synthetic Data Finally Solve the Data Access Problem?" this April in Singapore to discuss challenges and opportunities!
Europe can lead AI research, and our plan with OpenEuroLLM is to build something amazing - not for profit, and while sharing insights at scale. We have openings for **ML Research Engineers and Scientists** to work on OpenEuroLLM at the ELLIS Institute Tübingen.…
Join our mission to strengthen AI research in Europe 🇪🇺 We are looking for several ML Research Engineers and Scientists to work on OpenEuroLLM at the ELLIS Institute Tübingen. If you're passionate about large-scale model training, multilingual evaluation and want to contribute to…
I've struggled to announce this amidst so much dark & awful going on in the world, but with 1mo to go, I wanted to share that: (i) I finally graduated; (ii) In August, I'll begin as an assistant professor in the CS dept. of the National University of Singapore.
Last week, I shared two #ICLR2025 papers that were recognized by their Award committee. Reflecting on the outcome, I thought it might be interesting to share that both papers were previously rejected by #NeurIPS2024. I found the dramatic difference in reviewer perception of…
Delighted to share that two papers from our group @EPrinceton got recognized by the @iclr_conf award committee. Our paper, "Safety Alignment Should be Made More Than Just a Few Tokens Deep", received the ICLR 2025 Outstanding Paper Award. This paper showcases that many AI…
We’ve raised $30M in Seed + Series A funding led by @lightspeedvp and Walden Catalyst Ventures, with participation from Prosperity7 Ventures, Factory, Osage University Partners (OUP), Lip-Bu Tan, Chris Re, and more. Virtue AI is the first unified platform for securing AI across…
We build 𝗠𝗲𝗱𝗛𝗘𝗟𝗠✨: a comprehensive benchmark evaluating AI on realistic clinical tasks that healthcare professionals perform daily instead of just medical exams.👩⚕️⚕️ • Stanford HAI Blog: hai.stanford.edu/news/holistic-… • Leaderboard: crfm.stanford.edu/helm/medhelm/l…
1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!
Demystifying Long CoT Reasoning in LLMs arxiv.org/pdf/2502.03373 Reasoning models like R1 / O1 / O3 have gained massive attention, but their training dynamics remain a mystery. We're taking a first deep dive into understanding long CoT reasoning in LLMs! 11 Major…
🚀 Image AR models (𝗩𝗔𝗥 & 𝗟𝗹𝗮𝗺𝗮𝗚𝗲𝗻) can be distilled to 𝗢𝗡𝗘 step (up to 𝟮𝟭𝟴𝘅 𝗳𝗮𝘀𝘁𝗲𝗿) for the first time! See 𝑫𝒊𝒔𝒕𝒊𝒍𝒍𝒆𝒅 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈 ↓ 𝗪𝗲𝗯𝘀𝗶𝘁𝗲: imagination-research.github.io/distilled-deco… 𝗣𝗮𝗽𝗲𝗿: arxiv.org/abs/2412.17153 huggingface.co/papers/2412.17… (1/n)
💻 Are Code Agents Safe? #NeurIPS2024 In RedCode, we evaluate the risks of code execution and generation in 19 code agents within real system environments. 🗓️ Thu 12 Dec | 4:30 PM – 7:30 PM PST 📍: West Ballroom A-D #5300 🔗: redcode-agent.github.io Stop by the RedCode poster…
Code agents are great, but not risk-free in code execution and generation! 🎯 We propose RedCode, an evaluation platform to comprehensively evaluate code agents in terms of risky code execution and generation. 📅 Catch our #NeurIPS2024 poster session tomorrow (12/12) afternoon…
🎉 Deeply honored that our paper "Decoding Trust: Comprehensive Assessment of Trustworthiness in GPT Models” which was awarded Outstanding Paper at NeurIPS 2023, has just been awarded Best Scientific Cybersecurity Paper of 2024, in collaboration with @uiuc_aisecure @sanmikoyejo…
(1/4) Excited to share RaVL, which is appearing this week at #NeurIPS2024! RaVL discovers and mitigates spurious correlations in fine-tuned vision-language models (VLMs). 📄 Paper: arxiv.org/abs/2411.04097 💻 GitHub: github.com/Stanford-AIMI/…
Exciting internship opportunity on privacy & foundation models with the amazing @lin_zinan at MSR! Zinan is an incredibly insightful and supportive mentor!
[𝗜𝗻𝘁𝗲𝗿𝗻 𝗛𝗶𝗿𝗶𝗻𝗴]We are hiring a [𝐒𝐩𝐫𝐢𝐧𝐠 𝟐𝟎𝟐𝟓] [𝐟𝐮𝐥𝐥-𝐭𝐢𝐦𝐞] intern working on 𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝗘𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻. If you are interested, please apply here jobs.careers.microsoft.com/global/en/job/… and send me an email: zinanlin at microsoft dot com
I am taking new Ph.D. students from @UChicagoCS and @DSI_UChicago in the 2024-2025 cycle! If you are interested in distributed optimization, data sharing, and trustworthy ML, please feel free to apply! More info on our research: litian96.github.io
Our Algorithms group at Microsoft Research is hiring interns in differential privacy, reasoning abilities of LLMs, and theory: jobs.careers.microsoft.com/global/en/job/… jobs.careers.microsoft.com/global/en/job/… jobs.careers.microsoft.com/global/en/job/…
Probing results are my fav in our paper (Sec 4.2)!! 1. LLMs clearly develop reasoning skills through direct DT (i.e., w/o CoT). 2. Harder tasks demand more internal computation to solve. 3. Probing accuracy peaks in the middle layers—not the final layer.
Through OOD transferability test, perturbation tests, and probing model internals, we show that LLMs learn some reasoning while memorizing training examples. We also FTed models on wrong answers/CoTs, and LLMs still demonstrated improved generalization despite the noise. (5/n)
Is an LLM’s reasoning ability solely based on its powerful memorization skills? We conducted an in-depth empirical study to explore this question and uncovered some fascinating findings. Check out @ChulinXie’s threads for more details!
*Do LLMs learn to reason, or are they just memorizing?*🤔 We investigate LLM memorization in logical reasoning with a local inconsistency-based memorization score and a dynamically generated Knights & Knaves (K&K) puzzle benchmark. 🌐: memkklogic.github.io (1/n)