Jacqueline He

@jcqln_h

cs phd @uwnlp, prev. bse cs @princeton

Joined May 2018

66Following

185Followers

Pinned

Jacqueline He Retweeted

Akari Asai@AkariAsai · Dec 4

🚨 I’m on the job market this year! 🚨 I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

118

823

198

126.0K

Jacqueline He Retweeted

Ilia Shumailov🦔@iliaishacked · Jun 10

Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)

115

20.0K

Jacqueline He@jcqln_h · Jun 9

Check out our work on LLMs and scientific knowledge updates!

YYike Wang@yikewang_ · Jun 9

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

5.0K

Jacqueline He@jcqln_h · Jun 9

congrats @kjha02 !! cool work 🎊🎉🎇

KKunal Jha@kjha02 · Jun 9

Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

135

Jacqueline He Retweeted

Zhiyuan Zeng@ZhiyuanZeng_ · Mar 14

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

262

147

60.0K

Jacqueline He Retweeted

Hamish Ivison@hamishivi · Mar 4

How well do data-selection methods work for instruction-tuning at scale? Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best! More below ⬇️ (1/8)

332

282

85.0K

Jacqueline He@jcqln_h · Feb 20

We trained a diffusion LM! 🔁 Adapted from Mistral v0.1/v0.3. 📊 Beats AR models in GSM8k when we finetune on math data. 📈 Performance improves by using more test-time compute (reward guidance or more diffusion steps). Check out @jaesungtae's thread for more details!

JJake Tae@jaesungtae · Feb 20

1/ Excited to share our new work which we’ve been working on since past year: TESS 2! TESS 2 is a 7B instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model. 🧵

4.0K

Jacqueline He Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · Feb 21

Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with @jiminmun_) 👉🏻🧵

197

112

24.0K

Jacqueline He Retweeted

Ai2@allen_ai · Jan 21

Can AI really help with literature reviews? 🧐 Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections…

223

134

41.0K

Jacqueline He Retweeted

Hila Gonen@hila_gonen · Dec 5

Extremely excited to share that I will be joining @UBC_CS as an Assistant Professor this summer! I will be recruiting students this coming cycle!

147

11.0K

Jacqueline He@jcqln_h · Nov 19

Check out our OpenScholar project!! Huge congrats to @AkariAsai for leading the project — working with her has been a wonderful experience!! 🌟

AAkari Asai@AkariAsai · Nov 19

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 @uwnlp @allen_ai With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,…

219

Jacqueline He Retweeted

Howard Yen@HowardYen1 · Oct 4

Introducing HELMET, a long-context benchmark that supports >=128K length, covering 7 diverse applications. We evaluated 51 long-context models and found HELMET provide more reliable signals for model development github.com/princeton-nlp/… A 🧵 on why you should use HELMET⛑️

16.0K