Lessons from Salesforce's large-scale analysis: why most agents fail post-launch, and what the successful ones do differently.
Salesforce's framework for enterprise agents isn't a single model โ it's a layered stack where each layer has a distinct job. Getting this separation right is the foundation everything else rests on.
Traditional software development spends 90% of effort before launch. Enterprise agents invert this completely. John Kucera from Salesforce: "90% of the work is after you go live to manage and improve the agent."
Real users immediately expose edge cases that never appear in demos โ ambiguous phrasing, unexpected intents, requests the agent was never designed to handle. This is where trust is earned or lost.
Three disciplines separate successful pre-launch preparation from the teams that scramble after going live.
Resist the temptation to build an agent that handles everything. A focused, narrow agent lets your team learn the feedback cycle before stakes are high. Overcommitting to broad scope early leads to agents that are mediocre everywhere.
Salesforce introduced Agentic Work Units (AWUs) โ discrete units that measure meaningful work completion, not just activity (messages sent, API calls made).
For a support agent, the KPI is containment rate: percentage of cases fully resolved without human follow-up. This is a business outcome, not a technical metric.
Input guardrails protect data entering the LLM:
Output guardrails validate responses before delivery:
The teams that scaled successfully built tight feedback loops. Not just collecting failures โ categorizing them so the fix goes to the right owner immediately.
Salesforce identified four triage categories for agent failures. Each maps to a different owner and a different fix type:
These are the recurring mistakes Salesforce observed across thousands of deployments. Each one has a clear, better alternative.
The mistake: Routing every decision through the LLM โ including simple deterministic operations like order status lookups.
The fix: Salesforce built Agent Script, a TypeScript framework that enables deterministic control flow alongside probabilistic LLM reasoning. Not every step needs AI โ use code where the logic is known.
if/else to handle it, you probably should.The mistake: Using capitalization and emphasis ("NEVER do X", "ALWAYS check Y") to enforce business rules in system prompts. This does not reliably modify LLM behavior.
The fix: Encode business rules as conditional logic in code. Geographic restrictions, compliance rules, escalation thresholds โ these belong in the System of Work layer, not as text constraints in a prompt.
The mistake: Passing full API responses and entire documents as context. An insurance company was feeding full policy PDFs into every query.
The fix: Right-size context to only what the LLM needs for this specific query. The insurance company reduced to relevant sections only โ both latency and accuracy improved.
As individual agents mature, the next frontier is multi-agent systems โ where a parent agent coordinates multiple specialized sub-agents, each owning a narrower problem.
Beyond chat interfaces, Salesforce sees agents being deployed for multi-session workflows โ tasks that span days, background automation triggered by events, and multi-channel deployments across web, phone, email, and Slack.