Yana Wei

@yanawei_

PhD student@Johns Hopkins University; Multimodal Understanding, Embodied Agent, Image Editing

Joined October 2023

45Following

68Followers

Pinned

Yana Wei@yanawei_ · Jul 11

🔥 Thrilled to release our new multimodal RL work: Open Vision Reasoner! A powerful 7B model with SOTA performance on language & vision reasoning benchmarks, trained with nearly 1K steps of multimodal RL. Our journey begins with a central question: Can the cognitive behaviors…

yanawei_'s tweet image. 🔥 Thrilled to release our new multimodal RL work: Open Vision Reasoner!
A powerful 7B model with SOTA performance on language &amp; vision reasoning benchmarks, trained with nearly 1K steps of multimodal RL.

Our journey begins with a central question:
Can the cognitive behaviors…

2.0K

Yana Wei Retweeted

Vishal Patel@vishalm_patel · Jul 15

🪞 We'll present Perception in Reflection at ICML this week! We introduce RePer, a dual-model framework that improves visual understanding through reflection. Better captions, fewer hallucinations, stronger alignment. 📄 arxiv.org/pdf/2504.07165 #ICML2025 @yanawei_ @JHUCompSci

900

Yana Wei Retweeted

Vishal Patel@vishalm_patel · Jul 15

🚀 Open Vision Reasoner (OVR) Transferring linguistic cognitive behaviors to visual reasoning via large-scale multimodal RL. SOTA on MATH500 (95.3%), MathVision, and MathVerse. 💻 Code: github.com/Open-Reasoner-… 🌐 Project: weiyana.github.io/Open-Vision-Re… #LLM @yanawei @HopkinsEngineer

1.0K

Yana Wei Retweeted

AK@_akhaliq · Jul 15

Kimi k2 + groq in anycoder Vibe coding 500+ loc Three.js mobile game in seconds

167

64.0K

Yana Wei@yanawei_ · Jul 16

Just three words and one link—beautiful project pages in surprising styles! ✨ Try magic Anycoder by @_akhaliq here 👉 huggingface.co/spaces/akhaliq… Original pages have totally different vibes! (weiyana.github.io/Perception-in-…)

7.0K

Yana Wei@yanawei_ · Jul 15

🥳 Thanks @_akhaliq for featuring our OVR! We’re continuously iterating on both models and data to release even more 🚀 powerful versions—stay tuned! Check out the nearly 1K-step multimodal RL and in-depth cognitive behavior analysis here: ✍️ ArXiv: arxiv.org/abs/2507.05255 🐼…

AAK@_akhaliq · Jul 14

Open Vision Reasoner Transferring Linguistic Cognitive Behavior for Visual Reasoning

210

Yana Wei Retweeted

Yasmine@CyouSakura · Jul 14

We are excited to introduce Open Vision Reasoner (OVR) 🚀 — transferring linguistic cognitive behavior to unlock advanced visual reasoning! 💡 Two-stage recipe • Massive linguistic cold-start on Qwen-2.5-VL-7B sparks “mental imagery” • ~1 k-step multimodal RL refines & scales…

138

10.0K