Yutong (Kelly) He
@electronickale
PhD student @mldcmu, I’m so delusional that doing generative modeling is my job
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
🔥🔥🔥
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.
Congrats Avi! 🎉🎉🎉
Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n
LLOKI (a variant of Loki): x.com/jmuiuc/status/…
Integrating spatial transcriptomics across platforms is hard - different gene panels, sparse data. We introduce LLOKI, using optimal transport + single-cell FMs for unified ST integration. Work led by @ellie_haber (@mldcmu) & Lane Fellow @SpencerKrieger biorxiv.org/content/10.110…
When the ddl is approaching and you are violently editing something you wrote a while ago

Dear program chairs of all conferences, please don’t put a 5000 character limit on our rebuttal response, especially when the reviewers have more than ten 7500-character text boxes for them to write reviews, thank you so much
Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n
Excited to share new work from my internship @GoogleAI ! Curious as to how we should measure the similarity between examples in pretraining datasets? We study the role of similarity in pretraining 1.7B parameter language models on the Pile. arxiv: arxiv.org/abs/2502.02494 1/🧵
Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker). We performed the largest-ever comparison of these algorithms. We find that they do not outperform generic policy gradient methods, such as PPO. 1/N
To trust LLMs in deployment (e.g., agentic frameworks or for generating synthetic data), we should predict how well they will perform. Our paper shows that we can do this by simply asking black-box models multiple follow-up questions! w/ @m_finzi and @zicokolter 1/ 🧵
A troubling incident unfolded at #NeurIPS2024, where a keynote speaker used a slide that perpetuated harmful stereotypes and racial biases against Chinese students and researchers. I wasn't attending the conference, but I watched the talk recording and followed this closely. 1/7