What Salesforce Learned from 20,000 Agent Deployments

🏗️

Section 1 of 6

The Four-Layer Enterprise Agent Architecture

›

Salesforce's framework for enterprise agents isn't a single model — it's a layered stack where each layer has a distinct job. Getting this separation right is the foundation everything else rests on.

EngagementSlack, chat, messaging — where users talk to the agent

AgentAI reasoning and decision-making core

System of WorkCRM, ERP — where actual work gets done

ContextData and metadata that grounds agent actions

A Trust Layer spans the entire stack — supporting multiple LLM providers with integrated guardrails. It is not a bolt-on; it is woven into every layer.

Live Demo — Hover each layer to understand its role

Architecture Stack

💬

Engagement Layer

Slack · Web Chat · SMS

🧠

Agent Layer

LLM reasoning · Tool calls · Sub-agents

⚙️

System of Work

Salesforce · SAP · Internal APIs

🗄️

Context Layer

Vector DB · Knowledge base · Metadata

Quick Check

Which layer is responsible for actually executing a database write in the Salesforce architecture?

⚖️

Section 2 of 6

The 90/10 Rule — Launch Is the Beginning, Not the End

›

Traditional software development spends 90% of effort before launch. Enterprise agents invert this completely. John Kucera from Salesforce: "90% of the work is after you go live to manage and improve the agent."

Teams that treat launch as the finish line consistently fail to scale past pilot. Those that build for post-launch operations from day one are the ones that reach 20,000 deployments.

Real users immediately expose edge cases that never appear in demos — ambiguous phrasing, unexpected intents, requests the agent was never designed to handle. This is where trust is earned or lost.

Demo — Traditional vs. Agent Effort Distribution

Effort Split: Pre-Launch vs. Post-Launch

Traditional SW

Pre-launch

90% before launch, 10% after

Agent SW

Pre-launch

10% before launch, 90% managing & improving after

Quick Check

According to Salesforce's 20,000-deployment analysis, when do most agent failures actually occur?

🚀

Section 3 of 6

Pre-Launch: Guardrails, KPIs, and Starting Small

›

Three disciplines separate successful pre-launch preparation from the teams that scramble after going live.

Resist the temptation to build an agent that handles everything. A focused, narrow agent lets your team learn the feedback cycle before stakes are high. Overcommitting to broad scope early leads to agents that are mediocre everywhere.

Salesforce's pattern: start with one high-value, well-defined use case. Measure it. Expand only after mastering that loop.

Salesforce introduced Agentic Work Units (AWUs) — discrete units that measure meaningful work completion, not just activity (messages sent, API calls made).

For a support agent, the KPI is containment rate: percentage of cases fully resolved without human follow-up. This is a business outcome, not a technical metric.

Bad KPIMessages handled per hour

Good KPICases resolved without escalation

Input guardrails protect data entering the LLM:

Secure data retrieval through controlled access layers
Zero data retention agreements preventing model training use
Trust-boundary hosting keeping sensitive data internal

Output guardrails validate responses before delivery:

Tool and sub-agent validation — prevent hallucinated actions
Grounding checks — answers must derive from specified sources
Content filtering for harmful material

Demo — Agent Lifecycle: From Concept to Operations

Pre-Launch Sequence

Click Play to walk through the pre-launch sequence.

Quick Check

What is an "Agentic Work Unit" (AWU) as introduced by Salesforce?

🔄

Section 4 of 6

Post-Launch: Feedback Loops and Triage Categories

›

The teams that scaled successfully built tight feedback loops. Not just collecting failures — categorizing them so the fix goes to the right owner immediately.

Salesforce identified four triage categories for agent failures. Each maps to a different owner and a different fix type:

Demo — Failure Funnel: Where Agents Break Down

Failure Categories by Frequency (typical distribution)

Click a category to see the recommended fix.

Speed of feedback loop determined which teams scaled versus which stayed in pilot. The fix type matters, but the routing speed is what separates the 10x teams.

Quick Check

An agent correctly understands the user's intent, but its answer cites outdated policy information. Which triage category does this fall under?

⚠️

Section 5 of 6

Three Anti-Patterns That Kill Enterprise Agents

›

These are the recurring mistakes Salesforce observed across thousands of deployments. Each one has a clear, better alternative.

The mistake: Routing every decision through the LLM — including simple deterministic operations like order status lookups.

The fix: Salesforce built Agent Script, a TypeScript framework that enables deterministic control flow alongside probabilistic LLM reasoning. Not every step needs AI — use code where the logic is known.

Rule of thumb: if you could write an if/else to handle it, you probably should.

The mistake: Using capitalization and emphasis ("NEVER do X", "ALWAYS check Y") to enforce business rules in system prompts. This does not reliably modify LLM behavior.

The fix: Encode business rules as conditional logic in code. Geographic restrictions, compliance rules, escalation thresholds — these belong in the System of Work layer, not as text constraints in a prompt.

The mistake: Passing full API responses and entire documents as context. An insurance company was feeding full policy PDFs into every query.

The fix: Right-size context to only what the LLM needs for this specific query. The insurance company reduced to relevant sections only — both latency and accuracy improved.

Demo — Policy Encoding: Prompt vs. Code

Click a rule to see how it should be implemented

Restrict agent to users in North America only

PROMPT ✗

Escalate if order value exceeds $10,000

PROMPT ✗

Never reveal SSN in agent responses

PROMPT ✗

Click a rule above to see how Salesforce recommends implementing it.

Quick Check

Salesforce's Agent Script is designed to solve which specific anti-pattern?

🔮

Section 6 of 6

Multi-Agent Orchestration and What's Next

›

As individual agents mature, the next frontier is multi-agent systems — where a parent agent coordinates multiple specialized sub-agents, each owning a narrower problem.

Hierarchy3-level: orchestrator → specialist → sub-specialist

BenefitSimpler instructions per agent, smaller context windows

PatternParent routes, children execute, parent synthesizes

Beyond chat interfaces, Salesforce sees agents being deployed for multi-session workflows — tasks that span days, background automation triggered by events, and multi-channel deployments across web, phone, email, and Slack.

The disciplines that don't change: start small, measure outcomes, build tight feedback loops, encode policies in code, keep context lean. Models and tooling evolve; these principles remain constant.

Demo — Context Window Efficiency: Fat vs. Lean

Insurance company case study — policy document retrieval

Before (full doc)

~50,000 tokens

After (relevant sections)

~3,500 tokens

—

Latency change

—

Accuracy change

—

Token cost

Quick Check

In a three-level multi-agent hierarchy, what is the role of the parent (orchestrator) agent?

What 20,000 Enterprise Agent Deployments Taught Us

Live Demo — Hover each layer to understand its role

Demo — Traditional vs. Agent Effort Distribution

Demo — Agent Lifecycle: From Concept to Operations

Demo — Failure Funnel: Where Agents Break Down

Demo — Policy Encoding: Prompt vs. Code

Demo — Context Window Efficiency: Fat vs. Lean

All 6 Lessons Unlocked 🎉