Salesforce AI Research

@SFResearch

We advance state-of-the-art #AI techniques that pave the path for innovative products at Salesforce. Focus areas include #AgenticAI, #NLP, #TrustedAI.

Palo Alto, CA

Joined September 2014

302Following

17KFollowers

Pinned

Salesforce AI Research@SFResearch · Jul 16

🔬 Our research team is pioneering AI agents from the ground up. We're developing autonomous systems that could revolutionize data-driven sustainability decisions across industries—all while maintaining ethical standards & sustainability in our research process. Learn more:…

SFResearch's tweet image. 🔬 Our research team is pioneering AI agents from the ground up.

We're developing autonomous systems that could revolutionize data-driven sustainability decisions across industries—all while maintaining ethical standards &amp; sustainability in our research process.

Learn more:…

799

Pinned

Salesforce AI Research@SFResearch · Jul 14

While everyone talks about AI protocols, the real competitive advantage goes to organizations mapping their work ontology NOW. @silviocinguetta explains why companies with well-defined business taxonomies will rapidly deploy interoperable agents tomorrow. This insight on…

SSilvio Savarese@silviocinguetta · Jul 8

Enterprise AI can't thrive in the "Agentic Wild West" we're living in today. In order to collaborate across companies, AI agents will need universal protocols. My thoughts on the interoperable #FutureofAI: sforce.co/3Iv7Ss6

1.0K

Salesforce AI Research@SFResearch · 5 m

📰 @VentureBeat deep dive on MCPEval is live! 📄 Paper: bit.ly/3TKXpLR 📰 Article: bit.ly/40tFg96 "We now need to figure out how to evaluate [agents] properly" - the challenge MCPEval solves by bringing testing to the same environment where agents actually…

SFResearch's tweet card. Researchers from Salesforce unveiled MCPEval, a new method to evaluate AI agent performance and tool use within MCP servers.

Salesforce AI Research@SFResearch · 36 m

💡 Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models 💡 📄 Paper: bit.ly/44IAvuO 💻 Code: bit.ly/4lLjQgd 😵‍💫 Have a task but experiencing prompt engineering existential dread? Few-shot or zero-shot? Chain-of-thought or ReAct?…

SFResearch's tweet image. 💡 Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models 💡

📄 Paper: bit.ly/44IAvuO
💻 Code: bit.ly/4lLjQgd

😵‍💫 Have a task but experiencing prompt engineering existential dread?

Few-shot or zero-shot? Chain-of-thought or ReAct?…

253

Salesforce AI Research@SFResearch · Jul 22

🏆 #ICML2025 Best Paper Award: AI Safety Should Prioritize the Future of Work 📄 Paper: arxiv.org/abs/2504.13959 🎉 Congratulations to Sanchaita Hazra @hsanchaita, Bodhisattwa Prasad Majumder @mbodhisattwa, and Tuhin Chakrabarty @TuhinChakr for winning the Outstanding Award —…

SFResearch's tweet image. 🏆 #ICML2025 Best Paper Award: AI Safety Should Prioritize the Future of Work

📄 Paper: arxiv.org/abs/2504.13959

🎉 Congratulations to Sanchaita Hazra @hsanchaita, Bodhisattwa Prasad Majumder @mbodhisattwa, and Tuhin Chakrabarty @TuhinChakr for winning the Outstanding Award —…

1.0K

Salesforce AI Research@SFResearch · Jul 21

What an amazing #ICML2025 in Vancouver! Our AI Research team had incredible conversations about the future of AI and machine learning. Thanks to everyone who stopped by to discuss our accepted papers. If you missed them, here are all the links - bookmark these for your research:…

SFResearch's tweet image. What an amazing #ICML2025 in Vancouver! Our AI Research team had incredible conversations about the future of AI and machine learning.

Thanks to everyone who stopped by to discuss our accepted papers. If you missed them, here are all the links - bookmark these for your research:…

692

Salesforce AI Research@SFResearch · Jul 19

🔧 Explore @Salesforce's open source projects on @github! We're building the future through collaboration & innovation. Featured repositories: ➡️ CodeGen - AI code generation ➡️ LAVIS - Language-vision AI ➡️ Lightning Web Components ➡️ Salesforce CLI ➡️ Merlion - Time series…

SFResearch's tweet image. 🔧 Explore @Salesforce's open source projects on @github! We're building the future through collaboration &amp; innovation.

Featured repositories:
➡️ CodeGen - AI code generation
➡️ LAVIS - Language-vision AI
➡️ Lightning Web Components
➡️ Salesforce CLI
➡️ Merlion - Time series…

895

Salesforce AI Research@SFResearch · Jul 18

⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅…

2.0K

Salesforce AI Research@SFResearch · Jul 17

🧠 Human-amplifying AI starts in the research lab. Our work on reasoning & alignment shapes whether agents replace or enhance human potential. 85% query resolution reflects years of research building AI that handles complexity while preserving human empathy, creativity & judgment…

MMarc Benioff@Benioff · Jul 10

AI: Replace us or amplify us? Agentforce proves the power of Human + AI: 85% of queries resolved, 17% lower service costs. Read my FT piece on designing AI to elevate humanity. 🤝🤖 #Agentforce #AI #FutureOfWork 🔗 ft.com/content/3db52a…

695

Salesforce AI Research@SFResearch · Jul 15

🇨🇦 Excited to present our work at @COLM_conf in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval 👥Authors: Ye Liu, Rui Meng, Shafiq Joty @JotyShafiq, Silvio Savarese…

SFResearch's tweet image. 🇨🇦 Excited to present our work at @COLM_conf in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers:

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval
👥Authors: Ye Liu, Rui Meng, Shafiq Joty @JotyShafiq, Silvio Savarese…

1.0K

Salesforce AI Research@SFResearch · Jul 12

The counterintuitive finding: explicit step-by-step planning actually hurts performance on complex problems 🤯 SPARKLE reveals RL’s real power—fundamentally changing how models integrate knowledge (+4.3% gain). This mechanistic understanding is exactly what AI needs.

CCaiming Xiong@CaimingXiong · Jul 8

🤔 Ever wonder where reinforcement learning actually boosts (or hurts) LLM’s reasoning capabilities? Meet SPARKLE—a new analysis framework that dissects gains from RL in planning, knowledge integration, and subproblem solving. 📄 Paper: arxiv.org/abs/2506.04723 🌐 Project:…

2.0K

Salesforce AI Research@SFResearch · Jul 10

🚨 GTA1, our GUI Test-time Scaling Agent 🚨 📄 Paper: arxiv.org/abs/2507.05791 🔗 Project: os-world.github.io 💻 Code: github.com/Yan98/GTA1 🧠 7B/32B/72B models: huggingface.co/HelloKKMe 🏆 Top-1 on OSWorld benchmark (45.2% success rate), outperforming OpenAI’s CUA. GTA1…

SFResearch's tweet image. 🚨 GTA1, our GUI Test-time Scaling Agent 🚨

📄 Paper: arxiv.org/abs/2507.05791
🔗 Project: os-world.github.io
💻 Code: github.com/Yan98/GTA1
🧠 7B/32B/72B models: huggingface.co/HelloKKMe

🏆 Top-1 on OSWorld benchmark (45.2% success rate), outperforming OpenAI’s CUA.

GTA1…

1.0K

Salesforce AI Research@SFResearch · Jul 9

💡 Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning💡 📝 Paper: arxiv.org/abs/2506.04723 📎 Project: sparkle-reasoning.github.io 🔗 Code: github.com/sparkle-reason… We introduce SPARKLE - a fine-grained framework to understand HOW reinforcement…

SFResearch's tweet image. 💡 Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning💡

📝 Paper: arxiv.org/abs/2506.04723
📎 Project: sparkle-reasoning.github.io
🔗 Code: github.com/sparkle-reason…

We introduce SPARKLE - a fine-grained framework to understand HOW reinforcement…

759

Salesforce AI Research@SFResearch · Jul 8

🚨 Introducing VLM2Vec-V2 & MMEB-V2 🚨 At @Salesforce, we're advancing multimodal embeddings beyond natural images to unify videos, visual documents, and images in a single 2B parameter model. 📄 Paper: arxiv.org/abs/2507.04590 💻 Code: github.com/TIGER-AI-Lab/V… 🤗 Model:…

SFResearch's tweet image. 🚨 Introducing VLM2Vec-V2 &amp; MMEB-V2 🚨

At @Salesforce, we're advancing multimodal embeddings beyond natural images to unify videos, visual documents, and images in a single 2B parameter model.

📄 Paper: arxiv.org/abs/2507.04590
💻 Code: github.com/TIGER-AI-Lab/V…
🤗 Model:…

813

Salesforce AI Research@SFResearch · Jul 4

💡LZ Penalty: version 2 dropped this week with major enhancements! Repetitive text generation remains one of the hardest unsolved problems in LLM deployment. Current penalties are Band-Aids that don’t work reliably. We built something different: a penalty that understands…

SFResearch's tweet image. 💡LZ Penalty: version 2 dropped this week with major enhancements!

Repetitive text generation remains one of the hardest unsolved problems in LLM deployment. Current penalties are Band-Aids that don’t work reliably.

We built something different: a penalty that understands…

1.0K

Salesforce AI Research@SFResearch · Jul 3

🏜️ Taming the 'Agentic Wild West': How AI Protocols Will Expand Enterprise Boundaries 🏜️ 📝 Blog: salesforce.com/blog/how-ai-pr… We're at the TCP/IP moment for AI agents. The future belongs to orgs that shape emerging standards early. Great insights from our Chief Scientist…

SFResearch's tweet image. 🏜️ Taming the 'Agentic Wild West': How AI Protocols Will Expand Enterprise Boundaries 🏜️

📝 Blog: salesforce.com/blog/how-ai-pr…

We're at the TCP/IP moment for AI agents. The future belongs to orgs that shape emerging standards early.

Great insights from our Chief Scientist…

1.0K