Akshara Prabhakar (@aksh_555)

Pinned

A

🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 arxiv.org/abs/2407.01687 1/n

aksh_555's tweet image. 🤖 NEW PAPER 🤖

Chain-of-thought reasoning (CoT) can dramatically improve LLM performance

Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization?

A: Both!

🔗 arxiv.org/abs/2407.01687
1/n

7

48

315

228

77.0K

Pinned

Akshara Prabhakar Retweeted

A

Andrew Lampinen@AndrewLampinen · Feb 28

New preprint! In “Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior” we synthesize AI and cognitive science works into a perspective on pursuing generalizable understanding of cognition. Thread:

8

81

314

227

24.0K

Akshara Prabhakar Retweeted

Q

Quantumbytz@quantumbytz · Jun 5

Salesforce AI Introduces CRMArena-Pro: The First Multi-Turn and Enterprise-Grade Benchmark for LLM Agents #AI #MachineLearning #IoT #LLM marktechpost.com/2025/06/05/sal…...

0

2

3

2

155

Akshara Prabhakar Retweeted

S

Salesforce AI Research@SFResearch · May 30

🚨 Introducing CRMArena-Pro: The first multi-turn, enterprise-grade benchmark for LLM agents ✍️Blog: sforce.co/4dKBRIq 🖇️Paper: bit.ly/3T0AY4E 🤗Dataset: bit.ly/4kiRlG3 🖥️Code: bit.ly/4fkrZVM Most AI benchmarks test isolated, single-turn tasks.…

5

27

85

49

13.0K

A

Akshara Prabhakar@aksh_555 · May 12

.@SFResearch’s new series “AI Research Lab - Explained” just dropped! First up? See how we fine-tune specialized models to predict actions, not just language—enabling faster, more precise execution of real-world tasks. ⏯️ Watch and subscribe on YouTube: youtube.com/watch?v=vlvv4Z…

SSalesforce AI Research@SFResearch · May 12

🎬 NOW LIVE: "The AI Research Lab - Explained" debuts with our groundbreaking work on Large Action Models! Watch now: bit.ly/4kfipp4 Watch as Shelby Heinecke @shelbyh_ai reveals how we're training these specialized models to generate precise, executable actions…

0

5

12

0

12.0K

Akshara Prabhakar Retweeted

S

Silvio Savarese@silviocinguetta · May 9

Enterprise General Intelligence (EGI) won't require bigger models—it will demand better data! Our recent research demonstrates that smaller models (like our own xLAM-2) trained on high-quality multi-turn interaction data outperform frontier models like GPT-4o and Claude 3.5 in…

3

4

15

3

2.0K

A

Akshara Prabhakar@aksh_555 · May 8

🚀 Just dropped APIGen-MT-5k — 5K high-quality multi-turn agent interactions, generated with our APIGen-MT framework! Built for training & evaluating AI agents.

SSalesforce AI Research@SFResearch · May 8

Introducing APIGen-MT: Our agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for tuning AI agents! Try our open-sourced dataset today! 📊 Paper: bit.ly/44tORzx 🤗 Dataset: bit.ly/3GHuQM5 We used APIGen-MT to…

1

2

14

3

958

A

Akshara Prabhakar@aksh_555 · Mar 6

It was a great experience interacting with the @buZZrobot community 😃, thanks for the invite @sopharicks! Talk link: youtu.be/cj6hrF_RUSw

BBuzzRobot@buZZrobot · Feb 13

Join us on Feb 20 for a talk on Generalization vs. Memorization in LLMs. @aksh_555 from @SFResearch will dive into how Chain-of-Thought prompting impacts reasoning—does it truly enhance logic, or is it just smart memorization? lu.ma/40zw4308

0

11

3

528

Akshara Prabhakar Retweeted

S

Salesforce AI Research@SFResearch · Dec 17

🤖 Fresh from #NeurIPS2024: Our AI research scientist Akshara Prabhakar @aksh_555 discusses our demo of xLAM's specialized agents (customer, search, cleanup) collaborating in Slack! 🧠Refresher course: xLAM is #Salesforce’s family of Large Action models custom built for function…

2

4

18

8

3.0K

A

Akshara Prabhakar@aksh_555 · Dec 10

🚀 Introducing our #NeurIPS'24 (D&B track) paper, APIGen - an Automated PIpeline for Generating high-quality agentic data. While I cann't attend due to visa issues, my brilliant colleagues @JianguoZhang3 @TeeH912 @HaolinChen11 @aksh_555 will be there. Swing by our booth or the…

SSalesforce AI Research@SFResearch · Dec 10

🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation…

0

6

19

2

2.0K

Akshara Prabhakar Retweeted

S

Salesforce AI Research@SFResearch · Dec 10

🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation…

1

6

27

5

4.0K

A

Akshara Prabhakar@aksh_555 · Dec 8

Text-to-SQL has been my passion since Yale Spider 1.0! But as LLMs master it, real-world complexity demands more. 🚀After a year of work, Spider 2.0 shows the gap: o1 achieves just 17%! The path to production deployment is still long but exciting! more👉spider2-sql.github.io

XXLANG NLP Lab@XLangNLP · Dec 8

🎉Announcing Spider 2.0 Text-to-SQL challenge in the LLM era! 6 years after our Yale Spider 1.0, we're pushing it forward with: 🍊Real complex cloud DBs (3000+ cols) 🍋Multi-dialect SQL complexity 🍎Agentic coding workflows 🧐Best o1 only solves 17%! 👉spider2-sql.github.io

7

57

173

63

20.0K

A

Akshara Prabhakar@aksh_555 · Dec 5

It actually reminds me of the multi-source domain adaptation work - where knowing domain index during training makes the style concepts represented well (theoretically componentwisely disentangled using a lot of source domains with sufficient variations). Then one can just…

AAkshara Prabhakar@aksh_555 · Dec 4

Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n

0

1

5

1

824

A

Akshara Prabhakar@aksh_555 · Dec 4

Super enjoyable read: promising results that model mixing via a small, learnable router on top of independently trained "skills" (parametrized as PEFT experts) can actually generalize better than data mixing (e.g. multi-task learning)

AAkshara Prabhakar@aksh_555 · Dec 4

Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n

0

3

18

5

1.0K

A

Akshara Prabhakar@aksh_555 · Dec 4

Super interesting work & definitely check it out if you are attending NeurIPS! It reminds me the paper we published at ICLR this year — TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models — addressing the lifelong learning problem for robotic agents.…

AAkshara Prabhakar@aksh_555 · Dec 4

Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n

0

6

27

7

4.0K

Akshara Prabhakar Retweeted

A

Akshara Prabhakar@aksh_555 · Dec 4

Have a task that can be decomposed into two tasks requiring different skills? BUT - it is difficult to generate expert-curated training data? - do not want to use RAG? 🚀 Introducing LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 🔗 arxiv.org/abs/2410.13025 1/n

1

11

34

18

9.0K