Juan A. Rodríguez 💫
@joanrod_ai
PhD Candidate at @Mila_Quebec and @etsmtl and researching at @ServiceNowRSRCH in Montreal. Multimodal Generative Models, 💫StarVector
I’m excited to announce that 💫StarVector has been accepted at CVPR 2025! Over a year in the making, StarVector opens a new paradigm for Scalable Vector Graphics (SVG) generation by harnessing multimodal LLMs to generate SVG code that aesthetically mirrors input images and text.…
Excited to be at ICLR 2025 in Singapore this week! 🇸🇬 Want to connect? Ping me! 📝 Main Conference Papers 📄 BigDocs 📅 Thu, Apr 24 | ⏰ 10:00–12:30 SGT 📍 Hall 3 + 2B | Poster #280 Open dataset for training multimodal models on document + code tasks. 🔗…
🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual…
StarVector poster happening now at CVPR! Come by poster #31 if you want to chat about vector graphics, image-to-code generation, or just say hi!

I’m at #CVPR2025 in Nashville! I’ll be presenting 💫StarVector next Saturday. Feel free to reach out if you want to chat or meet up!
StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization…
Thanks to @_akhaliq for sharing our work. Technical paper and demos available on the project webpade: anthonygosselin.github.io/Ctrl-Crash-Pro…
Ctrl-Crash Controllable Diffusion for Realistic Car Crashes
More on inverse rendering code generation, this time with visual workflows. 💫 StarFlow turns sketches into code that compiles into workflow visuals.
🚀 New paper from our team at @ServiceNowRSRCH! 💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬 We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows.…
👏10/10 one of those papers where I've been beating a drum like "someone needs to do this," then start implementing it and lo and behold they've gone and done it better than I could
Thanks @_akhaliq for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF). 🧠 We think we cracked SVG generalization with this one. Go read the paper! arxiv.org/abs/2505.20793 More details on…
Congrats @TianbaoX and team on this exciting work and release! 🎉 We’re happy to share that Jedi-7B performs on par with UI-Tars-72B agent on our challenging UI-Vision benchmark, with 10x fewer parameters! 👏 Incredible 🤗Dataset: huggingface.co/datasets/Servi… 🌐uivision.github.io
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
The UI-Vision Benchmark is out on HuggingFace: huggingface.co/datasets/Servi… ✅Now accepted at ICML 2025. 🔥 Go test your UI Agents on the benchmark!
🚀 Excited to share that UI-Vision has been accepted at ICML 2025! 🎉 We have also released the UI-Vision grounding datasets. Test your agents on it now! 🚀 🤗 Dataset: huggingface.co/datasets/Servi… #ICML2025 #AI #DatasetRelease #Agents
Can we give LLM agents very high level Data Science goals, have them write code and analyze results that yield useful insights ? This ICLR 2025 paper introduces InsightBench to evaluate this scenario, along with an agent that does quite well at the task: arxiv.org/abs/2407.06423
This is a great step toward efficient online RL! Congrats to Dima (@DBahdanau) and the team on this release. I'm excited to start using PipelineRL for multimodal use cases! 💫
I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: github.com/ServiceNow/Pip… Blog: huggingface.co/blog/ServiceNo…
This week, Mila researchers will present more than 90 papers at @iclr_conf in Singapore. Every day, we will share a schedule featuring Mila-affiliated presentations. Day 1 #ICLR2025 👇 mila.quebec/en/news/follow…
10 Years on and now recognized by @iclr_conf for standing up to the test-of-time. Please join us in congratulating @DBahdanau, @kchonyc & @Yoshua_Bengio for their seminal work titled “Neural Machine Translation by Jointly Learning to Align and Translate”. arxiv.org/abs/1409.0473
Runner Up Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio Introducing a form of attention, the architecture has become a cornerstone of modern deep learning, providing a foundation for transformers and LLMs.
Announcing the Test of Time awards for ICLR 2025! This award recognizes papers published ten years ago at ICLR 2015 that have had a lasting impact on the field. Congratulations to the authors! blog.iclr.cc/2025/04/14/ann…
🚨 SLAM Labs presents Apriel-5B! And it lands right in the green zone 🚨 Speed ⚡ + Accuracy 📈 + Efficiency 💸 This model punches above its weight, beating bigger LLMs while training on a fraction of the compute. Built with Fast-LLM, our in-house training stack. 🧵👇
We anticipated this for a long time in the NoPE paper, and now, NoPE is being adopted by new LLM architectures: Llama 4 Family and Cohere Command A.
Google needs to tighten up the ship. Every time they plan a release, openai somehow drops theirs right on cue. This time gemini 2.5’s moment got completely hijacked, at least on my timeline 😅
> be google > create gemini 2.5 sota ai model way ahead of everyone else’s model > launch it the same day openai launches their new image model > everyone starts sharing openai images > absolutely no one is talking about you
StarVector does a very decent job on this example! Try here: huggingface.co/spaces/starvec…