Pranav Guruprasad
@pranavguru13
building @MetarchAI, @ManifoldRG. cs @GeorgiaTech, @BitsPilaniGoa. prev ml research @BerkeleyLab, @IITMadras
The next phase of AI benchmarks requires a shift towards evaluation of robust action generation across diverse modalities, environments, domains, and tasks. MultiNet v0.2 is a significant step towards this paradigm shift. Lot of interesting findings from this effort. More below!
Incredibly excited to announce the release of MultiNet v0.2 - a major update to our comprehensive open-source benchmark suite for evaluating Multimodal Models on Action tasks. Read on for several paper announcements, details on the evaluation harness and platform, and more!…
We’re recruiting OS Research Fellows for our Software Control Agents project 💻🤖 You'll be: → Designing evals for real-world tasks → Training agents for IDEs, creative tools, and APIs → Exploring new control architectures → Advancing open human–AI research Apply below 👇
Liverpool FC will retire the number 20 jersey across all levels of the club in honour and memory of Diogo Jota.
🚨Announcing Community Research Call #5 on Saturday, 7/19 at 9 AM PST! We're excited to share across our Multimodal AI, Self-Assembling Robotics, and Metacognition research and new ways to get involved, like our new Community Projects system. Register below!
check out our growing open-source contribution MultiNet v0.2 - a comprehensive open-source benchmark for training and evaluating multimodal vision-language-action models on agentic and embodied tasks. think multimodal robotics and AI agent platforms - but with all data…
Incredibly excited to announce the release of MultiNet v0.2 - a major update to our comprehensive open-source benchmark suite for evaluating Multimodal Models on Action tasks. Read on for several paper announcements, details on the evaluation harness and platform, and more!…
Why does this matter? We're rapidly moving into an era where models must seamlessly process vision, understand language, AND generate appropriate actions. It will underlie all important computing applications, but there's a challenge - evaluating these systems will become far to…
Incredibly excited to announce the release of MultiNet v0.2 - a major update to our comprehensive open-source benchmark suite for evaluating Multimodal Models on Action tasks. Read on for several paper announcements, details on the evaluation harness and platform, and more!…
We're working on some of the most interesting Open-Source Research and Engineering problems, specifically focused on defining and building the next generation of AI Benchmarks. Come join us if this excites you!
We're recruiting 2-3 OS Research Fellows for MultiNet Build benchmarks for multimodal AI that can see, understand, and act. Work on cutting-edge VLMs/VLAs and create the evaluation frameworks for next-gen AI.
Cool demo of a GUI for LLMs! Obviously it has a bit silly feel of a “horseless carriage” in that it exactly replicates conventional UI in the new paradigm, but the high level idea is to generate a completely ephemeral UI on demand depending on the specific task at hand.
Here's how Gemini 2.5 Flash-Lite writes the code for a UI and its contents based solely on the context of what appears in the previous screen - all in the time it takes to click a button. 💻 ↓
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
Many of you asked for code & weights for π₀, we are happy to announce that we are releasing π₀ and pre-trained checkpoints in our new openpi repository! We tested the model on a few public robots, and we include code for you to fine-tune it yourself.
Excited to release FAST, our new robot action tokenizer! 🤖 Some highlights: - Simple autoregressive VLAs match diffusion VLA performance - Trains up to 5x faster - Works on all robot datasets we tested - First VLAs that work out-of-the-box in new environments! 🧵/
India's Jindal Steel is on to something...
We hear a lot about LLMs revolutionizing agents, but few examples of their effectiveness are as profound as the work @krntneja has been doing in Agents for Education. Join @ManifoldRG's final Frontiers Series talk of the year on 12/7. Link in the thread below!
📣Our Third Frontiers Talk is 12/7 at 12 PM PST! Join our next Frontiers Talk: Building Multimodal, Doc-Grounded LLM Agents for Education with @krntneja! Learn about Jill Watson, an AI teaching assistant, and methods to enhance LLMs with minimal feedback. Details below!
PROTECTED OUR TRAP ALL SEASON 🔒 #StingEm 🐝
We ask new @southpkcommons members not “What is your company idea” but “What are you curious about?” The difference matters. One leaves room for fractal possibility, the other pushes for unearned certainty. We wanted to share the questions SPC is curious about right now.
Presenting Research Log #047! manifoldrg.com/research-log-0… This week, we released MultiNet v0.1! Congratulations to @pranavguru13, @HarshSikka, @devjwsong, and the rest of the MultiNet team! Check out the log for more details on that, as well as updates from our other projects.