Enrico Shippole
@EnricoShippole
Cleaning data hurts. We make it painless @TeraflopAI Odin gave his eye to acquire knowledge but I would give far more.
We open-sourced 99% of US caselaw on @huggingface. Both AI and legal tech companies are selling this data for a high premium. You can simply just build a wrapper around it and freely compete with them now. That is why we love open-source. huggingface.co/datasets/commo…
Why are legal services such a lucrative opportunity for AI? ・ ⚖️ Legal work is programming with words, a natural fit for AI. ・ 🚀 The market pull is unprecedented, with sales cycles cut from months to weeks. ・ 💻 AI engineers have an unfair advantage, able to build the…
I'm calling it now, this will be the next big hype in LLM RL algos after GRPO. It makes so much more sense intuitively to work on a sequence rather than on a token level, at least when our rewards are on a sequence level.
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Cocomelon for adults
Some news: We're building the next big thing — the first-ever AI-only social video app, built on a highly expressive human video model. Over the past few weeks, we’ve been testing it in private beta. Now, we’re opening early access: download the iOS app to join the waitlist, or…
Yeah their timing code is just broken. It registers cuda timing events on the default stream but runs the custom code on a separate stream. Fixing this drops the performance boost from 1000x down to ~2x or so. tl;dr nothing ever happens
Just hit #2 now. Plenty more open-source data to come soon.
We open-sourced 99% of US caselaw on @huggingface. Both AI and legal tech companies are selling this data for a high premium. You can simply just build a wrapper around it and freely compete with them now. That is why we love open-source. huggingface.co/datasets/commo…
Its incredibly easy to say "I am open source lover" when 1. You never meaningfully contributed to open source 2. You never financially supported any open source contributors Hear me out... You don't love open source community. You just love taking other peoples work without…
this is incredible!
Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with @AiEleuther, @VectorInst, @allen_ai, @huggingface, and DPI by @ShayneRedford. We provisioned a subset of the Common Pile, consisting only of public…
Really awesome video search developed by @0xdrej and @Wyndlabs_ai. Definitely look out for some exciting future open-source releases.
The infrastructure for working with video data at scale barely exists. Grass Video Search is our first step in changing that. You can now find anything in a video based on the contents of its frames, not just transcripts or tags. Over the last several months, the Grass…