Akshara Prabhakar
@aksh_555
applied scientist @SFResearch | prev @princeton_nlp, @surathkal_nitk
🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 arxiv.org/abs/2407.01687 1/n

New preprint! In “Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior” we synthesize AI and cognitive science works into a perspective on pursuing generalizable understanding of cognition. Thread:
Salesforce AI Introduces CRMArena-Pro: The First Multi-Turn and Enterprise-Grade Benchmark for LLM Agents #AI #MachineLearning #IoT #LLM marktechpost.com/2025/06/05/sal…...
🚨 Introducing CRMArena-Pro: The first multi-turn, enterprise-grade benchmark for LLM agents ✍️Blog: sforce.co/4dKBRIq 🖇️Paper: bit.ly/3T0AY4E 🤗Dataset: bit.ly/4kiRlG3 🖥️Code: bit.ly/4fkrZVM Most AI benchmarks test isolated, single-turn tasks.…
.@SFResearch’s new series “AI Research Lab - Explained” just dropped! First up? See how we fine-tune specialized models to predict actions, not just language—enabling faster, more precise execution of real-world tasks. ⏯️ Watch and subscribe on YouTube: youtube.com/watch?v=vlvv4Z…
🎬 NOW LIVE: "The AI Research Lab - Explained" debuts with our groundbreaking work on Large Action Models! Watch now: bit.ly/4kfipp4 Watch as Shelby Heinecke @shelbyh_ai reveals how we're training these specialized models to generate precise, executable actions…
Enterprise General Intelligence (EGI) won't require bigger models—it will demand better data! Our recent research demonstrates that smaller models (like our own xLAM-2) trained on high-quality multi-turn interaction data outperform frontier models like GPT-4o and Claude 3.5 in…
🚀 Just dropped APIGen-MT-5k — 5K high-quality multi-turn agent interactions, generated with our APIGen-MT framework! Built for training & evaluating AI agents.
Introducing APIGen-MT: Our agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for tuning AI agents! Try our open-sourced dataset today! 📊 Paper: bit.ly/44tORzx 🤗 Dataset: bit.ly/3GHuQM5 We used APIGen-MT to…
It was a great experience interacting with the @buZZrobot community 😃, thanks for the invite @sopharicks! Talk link: youtu.be/cj6hrF_RUSw
Join us on Feb 20 for a talk on Generalization vs. Memorization in LLMs. @aksh_555 from @SFResearch will dive into how Chain-of-Thought prompting impacts reasoning—does it truly enhance logic, or is it just smart memorization? lu.ma/40zw4308
🤖 Fresh from #NeurIPS2024: Our AI research scientist Akshara Prabhakar @aksh_555 discusses our demo of xLAM's specialized agents (customer, search, cleanup) collaborating in Slack! 🧠Refresher course: xLAM is #Salesforce’s family of Large Action models custom built for function…
🚀 Introducing our #NeurIPS'24 (D&B track) paper, APIGen - an Automated PIpeline for Generating high-quality agentic data. While I cann't attend due to visa issues, my brilliant colleagues @JianguoZhang3 @TeeH912 @HaolinChen11 @aksh_555 will be there. Swing by our booth or the…
🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation…
🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation…
Text-to-SQL has been my passion since Yale Spider 1.0! But as LLMs master it, real-world complexity demands more. 🚀After a year of work, Spider 2.0 shows the gap: o1 achieves just 17%! The path to production deployment is still long but exciting! more👉spider2-sql.github.io
🎉Announcing Spider 2.0 Text-to-SQL challenge in the LLM era! 6 years after our Yale Spider 1.0, we're pushing it forward with: 🍊Real complex cloud DBs (3000+ cols) 🍋Multi-dialect SQL complexity 🍎Agentic coding workflows 🧐Best o1 only solves 17%! 👉spider2-sql.github.io
It actually reminds me of the multi-source domain adaptation work - where knowing domain index during training makes the style concepts represented well (theoretically componentwisely disentangled using a lot of source domains with sufficient variations). Then one can just…
Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n
Super enjoyable read: promising results that model mixing via a small, learnable router on top of independently trained "skills" (parametrized as PEFT experts) can actually generalize better than data mixing (e.g. multi-task learning)
Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n
Super interesting work & definitely check it out if you are attending NeurIPS! It reminds me the paper we published at ICLR this year — TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models — addressing the lifelong learning problem for robotic agents.…
Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n
Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n