Silvio Savarese
@silviocinguetta
Executive Vice President, Chief Scientist @salesforce. Adjunct Professor of Computer Science @Stanford University. Faculty co-director @StanfordSVL. #AI
We need more nuanced conversations about capability vs. consistency. Our MCPEval research tackles this head-on—moving beyond static benchmarks to automated, end-to-end evaluation across real-world domains. How agents actually perform in context matters. #TrustedAI
⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅…
Enterprise AI can't thrive in the "Agentic Wild West" we're living in today. In order to collaborate across companies, AI agents will need universal protocols. My thoughts on the interoperable #FutureofAI: sforce.co/3Iv7Ss6
As @Benioff says, responsible AI deployment requires robust guardrails that protect our most vulnerable users. Thank you @SFResearch for working every day to build models and systems that recognize human fragility and respond with care, not exploitation. #TrustedAI
Great article on the very dark side of LLMs — and why they aren’t magic oracles. The prompt matters. The data matters. The framework matters. LLMs hallucinate by design, flatter users, reinforce delusions, and can spiral vulnerable people into danger. ❤️ nytimes.com/2025/06/13/tec…
Great to see our AI Research Lab series continue with @jcniebles breaking down multimodal AI. Cross-modal reasoning is more than impressive - it's how #EnterpriseAI can truly understand the world. Definitely worth a watch:
🚨New Episode Drop!🚨 🧠 AI Research Lab - Explained: The Future is Multimodal You text, share photos, record videos—seamlessly switching between data types. Why can't AI? Our Salesforce AI team builds multimodal systems that understand text, images, audio, and video…
Synthesized data for #EnterpriseAI evaluation is an ethical imperative. CRMArena-Pro lets us rigorously test agents in a real-life business environment—a messy, multi-step, complex world—without putting sensitive data at risk. Proud of the team's work toward safer, more…
🚨 Introducing CRMArena-Pro: The first multi-turn, enterprise-grade benchmark for LLM agents ✍️Blog: sforce.co/4dKBRIq 🖇️Paper: bit.ly/3T0AY4E 🤗Dataset: bit.ly/4kiRlG3 🖥️Code: bit.ly/4fkrZVM Most AI benchmarks test isolated, single-turn tasks.…
Exciting new 3D renderer from WorldLabs!
Check out this shiny new, fast and dynamic web renderer for 3D Gaussian Splats! The things one could do are just mind boggling! So proud of the @theworldlabs team that made this happen, and we are making this open source for everyone!
Check out this shiny new, fast and dynamic web renderer for 3D Gaussian Splats! The things one could do are just mind boggling! So proud of the @theworldlabs team that made this happen, and we are making this open source for everyone!
Open Sourcing Forge: 3D Gaussian splat rendering for web developers! 3DGS has become a dominant paradigm for differentiable rendering, combining high visual quality and real-time rendering. However, support for splatting on the web still lags behind its adoption in AI.
Businesses need consistent AI, not just brilliant AI. Capability + reliability beats unpredictable genius every time. Great @FortuneMagazine piece on Enterprise General Intelligence (#EGI). Thanks @SharonGoldman! bit.ly/4dvMG0O
⚡🧠"Jagged intelligence" perfectly captures why LLMs can pass the bar exam but fail simple riddles. @Silviocinguetta's #EGI framework (Enterprise General Intelligence) focuses on what enterprises actually need: capability + consistency, not just raw intelligence. New piece…
Enterprise General Intelligence (EGI) won't require bigger models—it will demand better data! Our recent research demonstrates that smaller models (like our own xLAM-2) trained on high-quality multi-turn interaction data outperform frontier models like GPT-4o and Claude 3.5 in…

Simplicity reveals deep truths. Our research shows simple rejection sampling matches complex RL methods for LLM reasoning. The key insight? Not reward normalization, but knowing which negative examples to discard—opening paths to more efficient training. #MinimalistAI 👇
📝 NEW RESEARCH! 📝 Meet "A Minimalist Approach to LLM Reasoning," that proves that simpler is better when it comes to AI reasoning. 🖇️ Full paper: bit.ly/3RDcHRN 🧠 Analysis from @weights_biases: bit.ly/4lO0XcZ Key insight: The simple rejection sampling…
Excited to share that our in-house Salesforce xLAM models (from 1B models for edge devices to 70B for enterprise systems) are excelling at the function calling tasks that are the foundation for accessible, deployable agentic #EnterpriseAI at scale. An important step toward EGI!…
Our xLAM (#LargeActionModels) family just got an upgrade! 1️⃣ Multi-turn, natural conversation support 2️⃣ Smarter multi-step reasoning 3️⃣ Models from 1B to 70B for ultimate flexibility 🤗 HuggingFace: bit.ly/4jyj2tu 👑 BFCL Leaderboard: bit.ly/3WIZdY3 Our…
Learn how the team’s SFR-Guard goes beyond mere detection of toxicity and malicious attacks to explain why and how severe they are, giving businesses unprecedented control and transparency. Essential for #TrustedAI and #Enterprise AI adoption. Great work, @SFResearch!
🦺 The science behind safer AI agents: Our new SFR-Guard models detect both toxic content (from hate speech to profanity) and prompt injection attacks (like role-play, DAN jailbreaks, and malicious code generation). 🧠 Details: sforce.co/43XLfWf Besides detection,…
I've long believed we need a different lens through which to view the future of artificial intelligence in business. Very proud to see my concept of Enterprise General Intelligence (EGI) has been extensively covered by Sabrina Ortiz @ZDNet in this recent article:…

Silvio Savarese, da Stanford a Salesforce per guidare l’era degli agenti digitali. Ispirati a Sinner dlvr.it/TJyKqw
After years studying AI applications, I'm proposing a new framework: Enterprise General Intelligence (EGI). It redefines AI excellence through high capability + unwavering consistency—hallmarks of true "champion" systems. More on the concept here: sforce.co/4jlcjTP
Grazie mille to @01net for our #TDX25 discussion. Key takeaway: Business transformation won't come from one monolithic AI system, but small, specialized models working with humans. Enjoyed the piece (in Italian!); it reflects the #FutureofAI we're building at Salesforce:…
Singapore represents the ideal intersection of talent, infrastructure and policy for AI innovation. Our $1B investment strengthens our global research network tackling urgent #EnterpriseAI challenges—from data shortage to security protocols for #AgenticAI. The #FutureofAI demands…
🇸🇬 Just announced: 🇸🇬 A $1B investment in Singapore, home to our first international AI Research hub since 2019. Read why: 👉sforce.co/3DDiROf We are proud of our Lion City team's contributions—100+ research papers, breakthrough models like BLIP and time-series Moirai,…