Kyle Lo
@kylelostat
#nlproc #hci research scientist @allen_ai, co-lead of data for OLMo w/ @soldni, he/him, find me on 👉🏻http://kylelo.bsky.social🧋
issues w preference LM benchmarks 🐡data contains cases where the "bad" response is just as good as chosen one 🐟model rankings can feel off (claude ranks lower than expected) led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals
In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇
presenting olmOCR at the poster session (2:15pm 211 West) for #codeml workshop at #icml2025! 🐟 fully open source OCR, comparable or better than frontier VLMs 🐠 all weights, data, code free & public 🐡 new benchmark of OCR "unit tests" on diverse PDFs & challenging OCR cases
New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released: 1️⃣ New benchmark for fair comparison of OCR engines and APIs 2️⃣ Improved inference that is faster and cheaper to run 3️⃣ Docker image for easy deployment
Presenting two posters at ICML over the next two days: - Both at 11am - 1:30pm - Both about how to improve pre-training with domains - Both at stall # E-2600 in East Exhibition Hall A-B (!) Tomorrow: WebOrganizer w/ @soldni & @kylelostat Thursday: MeCo by @gaotianyu1350
will be at #icml2025, lemme kno if wanna chat about OLMo pretraining data curation, evaluation, data mixing, etc!👋 find us at poster sess on 📅Wed 7/16 @ 11am⏲️ to learn about Web Organizer, distilling web data taxonomies into small models & using them for LM data mixing!
🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N
excited to win 🏆 this award for our work on molmo & pixmo, showing the value of high-quality data curation for VLMs! recalling when we released same time as Llama 3.2 😆 huge kudos to @mattdeitke chris clark & @anikembhavi for their leadership on this project!
Molmo won the Best Paper Honorable Mention award @CVPR! This work was a long journey over 1.5 years, from failing to get strong performance with massive scale, low quality data, to focusing on modest scale extremely high quality data! Proud to see what it became. #CVPR2025
Thrilled to announce I've joined the incredible team at @allen_ai! I'll be working on language modeling!
excited to announce that I’ve joined the Allen institute, where I’ll be working on RL for LLMs.
great work from philippe as always☺️ agree w view reliability is absolutely key
🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120
lookin for strong data ppl to make tokens, eat snacks & drink boba w us ⌨️🍿🧋
Who would make you really excited if they joined Ai2? We always are looking to hire these people that seem like obvious strong fits.
we released OLMo 2 1B, showing again how well our OLMo 2 pretrain & post train recipe works! Our small 1B model is comparable or better than other top open weights-only alternatives while maintaining full open data, code & intermediate checkpoints!
We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.
outstanding paper award for our AI in Education work! 🐟 dataset of natural images of student solutions to K-12 math problems from online teaching platform 🐠 annotations (dense captions, VQA pairs) by teachers to eval VLMs chat w leads @samibaral144 @lucy3_li at #NAACL2025 🤩
🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…
starting now in Hall 3 poster 306! come chat w me @Muennighoff & @soldni about our MoE w fully open data, weights, code, and more! complete w iOS app to run it completely on device 😆 poster is so pretty 😭 and don’t miss oral session later too! #ICLR2025
excited to be at #ICLR2025 w/ the @allen_ai OLMo team presenting some of our work (below) come chat w/ us to learn more! we'd love to collaborate & we're also hiring :D