Yifei Hu
@hu_yifei
Machine Learning Researcher @reductoai | Prev: PhD @LifeAtPurdue | Opinions my own
We have a small gift for the open-source community: RolmOCR, a new OCR model for complex document processing! We at @reductoai trained a Qwen2.5-VL-7B model (by @Alibaba_Qwen ) using the amazing olmOCR dataset by @allen_ai earlier this year. As a result, we have RolmOCR - a…

Our Edit pipeline will create editable widgets (input area) on your documents and then fill those input area with information you want. @AlvinRyanputra did a great job on building and testing this feature!
The PDF on the left is scanned and has no metadata. Filling that out used to mean either manually entering the info, or annotating the layout with bounding boxes and hoping the structure never changes. This launch makes it possible for AI teams to do real human work at scale.
50%+ of the gemini 2.5 paper (20 out of 39 pages) was about Safety, Security, and Responsibility

I didn't ask for this but cursor still decided to act. I guess my code is really garbage.

If I remove my CTO’s code:
POV: After you remove CTO's shitty code.
I was trying to search "big beautiful bill" but the recommended keywords were kinda concerning... I swear i never used this account to search anything close to those keywords. And my google search recommendation is very normal


Fun fact: CAO (草) in Chinese literally means “F**k”. Llama 5 is gonna be f**king great!
I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀
I recently noticed something interesting: I ran a training job on 8*H100 2 days ago. It took 15.5 hours to finish. Today i requested another 8*H100 machine from the same cloud provider, and the training ETA is ~6.5 hours. Same dataset, model, config, environment.
It was quite a journey. Everyone at @reductoai turned into mean product managers in the past few weeks.
Incredibly excited to share more about Reducto Studio today. We started by building the most accurate APIs, and are now building a platform to help everyone use them. Studio is a toolkit for building, evaluating, and deploying production-ready document processing pipelines.
Career update: I have decided to accept the offer from LinkedIn. Starting next Monday, I will become a premium member on LinkedIn with the shinny golden badge. It’s only a two-month free trial so I hope I get a return offer in the future.

Spoiler alert: we are gonna move to a new office (codename "studio"). This new office has better chairs and monitors which will make us build, test, and deploy document processing pipelines faster. Come and visit us in SF.
We’ve been quietly working on something big at Reducto: a new, faster way to build, test, and deploy any document processing pipeline. So excited to share more next week 👀
When your office is next to Chinatown and you are craving sweet treats:

flash attention v2.8 can finally be smoothly installed via pip. Before it might take hours to compile.
Pro tip: ask ai to use threadpool to speed up your data preprocessing script.
