mok
@mokshith_v
ai x video @sievedata
Multi-modal models like GPT-4o are starting to outperform traditional models on core video/audio understanding tasks. One example is diarization in AI dubbing. Most tools break on multi-speaker videos. We're shipping a fix for this + major upgrades next week. Here's a preview.
ahi's background is unique - from building the world's smallest batteries to training high quality TTS and diffusion models in the Computational Image Group at Rice. excited to be working with him. if you want to work on internet-scale computer vision problems as well, DM me :)
just completed week 1 at @sievedata & the energy is unreal. i’m shipping next-gen video understanding apis from whiteboard to prod, on the applied ml team. huge thanks to @mokshith_v & @thpicy for bringing me onboard, & to @jcbmrshll for the mentorship. excited for what’s next
now who's gonna build the sonnet 3.5 equivalent in video models? something that doesn't top the leaderboards but wins on taste, vibes, and true utility.
There's a new AI video model topping the leaderboard: Seedance. It's #1 in the @ArtificialAnlys ranking for text-to-video and image-to-video. And after voting on 3k+ clips across dozens of models, I found the hype holds up. 5 things it's particularly good at (with examples) 👇
your ability to convince great people to work with you = your ability to clearly communicate what the fuck you actually do
so tl;dr - internet video is still useful but too many researchers are treating it as a complete replacement to the real thing and will soon learn a very bitter lesson because of it 🫡
I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi
If you're a cracked infra engineer who: - loves Go, ETL, OLAP DBs, k8s - wants to change the future of video AI - loves seeing their work translate to $$$ we're aggressively hiring right now @sievedata. Dm me. 💜🚀
Sieve is growing fast and needs engineers!
The last two months have been insane. I randomly came into the office this morning and decided to record this video on why there has literally never been a more exciting time to join @sievedata. We're working with leading research teams pushing the frontier of creative, robotics,…
best part of the first 20 minutes of my day is selecting my desired caffeine intake mechanism alternating each day is good for the mind
Just shipped Orbit. - Overlay running route & stats on photos - Track your PRs across any timeframe - Browse your run history like a commit log The app you use before you inevitably upload to Strava & Instagram. Details in 🧵
did you know ByteDance (parent company of TikTok) has employed a Head of Robotics Research for the last 6 years? wonder what he was up to.

more evals for hard to evaluate things 💜
We benchmarked 14 of the top AI dubbing tools, and the results shocked us. Even some of the popular names failed at preserving speaker identity or handling multi-speaker videos. For context, the evals were conducted by third-party native speakers across 8 languages.
i'm excited to finally share this with the world. foundation model improvements continue to allow our research team to make improvements up the stack and see material differences in output dubs. later this week we'll be sharing some updated, de-anonymized evals as well 🌎💜
Introducing Dubbing 3.0 - the highest quality AI video translator. - Handles multi-speaker video better than any provider - Expresses emotions better (e.g calm vs frustrated) - More natural, context-aware translations - Supports 30+ languages and accents
i am so excited for the role datasets like these will play in the future of robotics and gaming. unfortunate that this dataset in particular is only 720p, and not strictly curated towards more "interesting" scenarios, but still great work by the authors!
Sekai: A Video Dataset towards World Exploration A high-quality 5k hrs of egocentric worldwide video + audio dataset for world exploration, created from Youtube with high-quality annotations
video-native MJ aesthetics 😍 congrats to the entire team!
Introducing our V1 Video Model. It's fun, easy, and beautiful. Available at 10$/month, it's the first video model for *everyone* and it's available now.