Runsheng (Anson) Huang
@ansonhuang99
building best filmmaking agent @flik_ai | Prev @Penn @GeorgiaTech
The future is here @flik_ai
The future isn’t prompt engineering. The future is: “What story do you want to tell?” Don’t wait for permission. Start here: flik.la
💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance correlates strongly with “visual” representation quality in the LLM. 🤔 So, why not optimize these representations directly? 🚀 You guessed it—hola OLA-VLM!
Introducing OLA-VLM: a new paradigm to distilling vision knowledge into the hidden representations of LLMs, enhancing visual perception in multimodal systems. Learn more: github.com/SHI-Labs/OLA-V… GT x Microsoft collab by @praeclarumjj @zhengyuan_yang @JianfengGao0217 @jw2yang4ai