Viv

@Vtrivedy10

building AI that can see the world 👀 prev @awscloud, phd @ temple cs, http://storyforest.io 🌳

Joined February 2013

3KFollowing

575Followers

Pinned

Viv@Vtrivedy10 · Jul 24

Sneak Peak into Amber - a world building platform to bring your visual creativity to life Been having a blast building this, you can create Characters, Assets, Videos, Stories and remix everything in one place. Bring yourself to life with reference images and make unlimited…

2.0K

Pinned

Viv@Vtrivedy10 · Jul 22

had fun taking the new @hedra_labs realtime avatar for a spin! really easy to customize the avatar with a single image, some takeaways: - latency is rlly good, they're running on @livekit and it actually feels natural, no weird pauses - super easy dropdown to pick from OAI,…

182

Pinned

Viv@Vtrivedy10 · Jul 22

ok I was sleeping rlly hard on using playright to auto test with Claude Code, it does wonders for fixing UI issues especially comin from a non-frontend guy - ty @iannuttall for the shout 🙏🏽

216

Viv@Vtrivedy10 · 6 m

wan-2.2 image to vid is pretty solid, cool physics and follows style well. latency is tough though, 130 seconds for image to vid. veo3-fast still the one for quality and speed

Vtrivedy10's tweet image. wan-2.2 image to vid is pretty solid, cool physics and follows style well. latency is tough though, 130 seconds for image to vid. veo3-fast still the one for quality and speed

Viv@Vtrivedy10 · 24 h

love coming back to this thread, interesting how it mirrors what happened in software interfaces over the last 20 years we traded whacky but unique to your brand interfaces for standardized minimal building blocks from a maintainability standpoint it makes sense, you can reuse…

TThe Cultural Tutor@culturaltutor · Jun 23, 2022

Why Beauty Matters (and how it has been destroyed by "usability") A short thread...

104

Viv@Vtrivedy10 · Jul 27

the progress in computer vision frontier models in the last year has been amazing, mind boggling to have a model like this open-source looking back at what we had when I started my phd in CV 4 years ago, huge shoutout to this team ❤️

HHunyuan@TencentHunyuan · Jul 27

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines…

170

Viv@Vtrivedy10 · Jul 25

i love the local model ethos but it feels like the future for having a “good” threshold of intelligence is gonna end up being “open model hosted somewhere not your computer” the MOE architecture is great for inference IF you have gpu memory but the memory footprint prices out…

Viv@Vtrivedy10 · Jul 25

unfaltering cycle of gen media: you wanna build something —> no single model can do it —> stitch together a bunch of models —> new model comes out that one shots your work flow —> repeat

Viv@Vtrivedy10 · Jul 23

for anyone with the Gemini Ultra subscription - how’s the “Ingredients to Video” in Veo3? Seems like a really killer feature for composing assets but haven’t seen anyone post about it also @joshwoodward eventually opening it even if super limited to Pro would be awesome 🥹

132

Viv@Vtrivedy10 · Jul 22

you can build so many beautiful gen media startups on @FAL…so we will 😤 clean APIs, great performance, nice demos - they rlly be cookin over there

Viv@Vtrivedy10 · Jul 22

excited to play around with the updated (?) @GoogleDeepMind Gemini conversational image segmentation - I ran a bunch of tests of this capability before, it was cool but definitely had its rough edges with semantic understanding (ie. "the 2nd tallest man"), but their blog shows…

Vtrivedy10's tweet image. excited to play around with the updated (?) @GoogleDeepMind Gemini conversational image segmentation - I ran a bunch of tests of this capability before, it was cool but definitely had its rough edges with semantic understanding (ie. "the 2nd tallest man"), but their blog shows…

140

Viv@Vtrivedy10 · Jul 21

o3 is so smart but it can never fully give a spec in markdown, at some point it starts deviating away from markdown in the webapp, such a strange behavior after a certain length of planning doc

Viv@Vtrivedy10 · Jul 14

json prompting is having a moment again but this time for video gen, hope the model makers train for this directly for future releases and gives guidance on good key names to use it’ll make agentic video creation way better and usable

128

Viv@Vtrivedy10 · Jul 14

a frustrating part about studying math is stumbling upon new results and concepts and thinking “where tf did this even come from? what even prompted someone to come up with this” o3 is a great time traveler teacher that contextualizes what problems mathematicians were working…

Vtrivedy10's tweet image. a frustrating part about studying math is stumbling upon new results and concepts and thinking “where tf did this even come from? what even prompted someone to come up with this”

o3 is a great time traveler teacher that contextualizes what problems mathematicians were working…

Viv@Vtrivedy10 · Jul 13

Luis Enrique’s film prep for this match

141

Viv@Vtrivedy10 · Jul 12

regular @YouTube appreciation post for being the home of humans doing random cool things and sharing it with the world and shoutout to their algorithm, by far the experience that makes me feel happy after I use it, no other platform close for capturing user wellbeing/curiosity…

Vtrivedy10's tweet image. regular @YouTube appreciation post for being the home of humans doing random cool things and sharing it with the world

and shoutout to their algorithm, by far the experience that makes me feel happy after I use it, no other platform close for capturing user wellbeing/curiosity…

120

Viv@Vtrivedy10 · Jul 12

the idea of “knowing when to think” is still so interesting and totally unsolved the whole research field of adaptive compute is eyeing this rn and it seems like if you force the model to always think that the success rate of not thinking when it’s not needed isn’t great maybe…

wwh@nrehiew_ · Jul 11

ok initial vibes: its a non thinking model but it likes to start responses with "Thought Process:". I've found that it outputs this exact string for questions that requires it do perform non trival computation. It doesn't do this for questions where it needs to explain something

125

Viv@Vtrivedy10 · Jul 12

i think we’re all gonna be amazed at how good UX for agentic tasks like coding can be if a model as smart as kimi-2 (going off their demo vibes) can run at some crazy tok/s on @CerebrasSystems or @GroqInc I’m totally not resigned to a world of intelligence where everything is…