Kyle Lo

@kylelostat

#nlproc #hci research scientist @allen_ai, co-lead of data for OLMo w/ @soldni, he/him, find me on 👉🏻http://kylelo.bsky.social🧋

Seattle, WA

Joined January 2019

1KFollowing

3KFollowers

Kyle Lo@kylelostat · Jul 22

issues w preference LM benchmarks 🐡data contains cases where the "bad" response is just as good as chosen one 🐟model rankings can feel off (claude ranks lower than expected) led by @cmalaviya11 (TACL 2025), we study underspecified queries & detrimental effect on model evals

AAi2@allen_ai · Jul 22

In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇

4.0K

Kyle Lo@kylelostat · Jul 18

presenting olmOCR at the poster session (2:15pm 211 West) for #codeml workshop at #icml2025! 🐟 fully open source OCR, comparable or better than frontier VLMs 🐠 all weights, data, code free & public 🐡 new benchmark of OCR "unit tests" on diverse PDFs & challenging OCR cases

AAi2@allen_ai · Jun 18

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released: 1️⃣ New benchmark for fair comparison of OCR engines and APIs 2️⃣ Improved inference that is faster and cheaper to run 3️⃣ Docker image for easy deployment

2.0K

Kyle Lo Retweeted

Alex Wettig@_awettig · Jul 16

Presenting two posters at ICML over the next two days: - Both at 11am - 1:30pm - Both about how to improve pre-training with domains - Both at stall # E-2600 in East Exhibition Hall A-B (!) Tomorrow: WebOrganizer w/ @soldni & @kylelostat Thursday: MeCo by @gaotianyu1350

9.0K

Kyle Lo@kylelostat · Jul 14

will be at #icml2025, lemme kno if wanna chat about OLMo pretraining data curation, evaluation, data mixing, etc!👋 find us at poster sess on 📅Wed 7/16 @ 11am⏲️ to learn about Web Organizer, distilling web data taxonomies into small models & using them for LM data mixing!

AAlex Wettig@_awettig · Feb 18

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

2.0K

Kyle Lo@kylelostat · Jun 13

excited to win 🏆 this award for our work on molmo & pixmo, showing the value of high-quality data curation for VLMs! recalling when we released same time as Llama 3.2 😆 huge kudos to @mattdeitke chris clark & @anikembhavi for their leadership on this project!

MMatt Deitke@mattdeitke · Jun 13

Molmo won the Best Paper Honorable Mention award @CVPR! This work was a long journey over 1.5 years, from failing to get strong performance with massive scale, low quality data, to focusing on modest scale extremely high quality data! Proud to see what it became. #CVPR2025

2.0K

Kyle Lo Retweeted

Tyler Romero@tyleraromero · May 28

Thrilled to announce I've joined the incredible team at @allen_ai! I'll be working on language modeling!

155

11.0K

Kyle Lo Retweeted

finbarr@finbarrtimbers · May 27

excited to announce that I’ve joined the Allen institute, where I’ll be working on RL for LLMs.

893

87.0K

Kyle Lo@kylelostat · May 14

great work from philippe as always☺️ agree w view reliability is absolutely key

PPhilippe Laban@PhilippeLaban · May 12

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

1.0K

Kyle Lo@kylelostat · May 13

lookin for strong data ppl to make tokens, eat snacks & drink boba w us ⌨️🍿🧋

NNathan Lambert@natolambert · May 12

Who would make you really excited if they joined Ai2? We always are looking to hire these people that seem like obvious strong fits.

3.0K

Kyle Lo@kylelostat · May 1

we released OLMo 2 1B, showing again how well our OLMo 2 pretrain & post train recipe works! Our small 1B model is comparable or better than other top open weights-only alternatives while maintaining full open data, code & intermediate checkpoints!

AAi2@allen_ai · May 1

We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.

2.0K

Kyle Lo@kylelostat · Apr 28

outstanding paper award for our AI in Education work! 🐟 dataset of natural images of student solutions to K-12 math problems from online teaching platform 🐠 annotations (dense captions, VQA pairs) by teachers to eval VLMs chat w leads @samibaral144 @lucy3_li at #NAACL2025 🤩

NNAACL HLT 2025@naaclmeeting · Apr 25

🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…

3.0K

Kyle Lo@kylelostat · Apr 25

starting now in Hall 3 poster 306! come chat w me @Muennighoff & @soldni about our MoE w fully open data, weights, code, and more! complete w iOS app to run it completely on device 😆 poster is so pretty 😭 and don’t miss oral session later too! #ICLR2025

KKyle Lo@kylelostat · Apr 24

excited to be at #ICLR2025 w/ the @allen_ai OLMo team presenting some of our work (below) come chat w/ us to learn more! we'd love to collaborate & we're also hiring :D

9.0K