Why 8 weeks?
Eight weeks is the sweet spot between moving fast enough to show ROI before stakeholder attention wanes, and moving slow enough to build something that actually works in production. We've run implementations as short as 5 weeks and as long as 16 — and 8 weeks delivers the best balance of scope and quality for a first Agentforce deployment.
This guide covers what those 8 weeks actually look like.
Phase 1: Discovery (Weeks 1–2)
The biggest mistake we see with Agentforce implementations is skipping straight to building. Two weeks of discovery prevents four weeks of rebuilding.
Org audit checklist
- Data quality score — Agentforce is only as good as your CRM data. Run a data quality audit on the objects your agents will touch. Accounts with <80% field completion will cause agent hallucinations.
- Flow inventory — Document all active flows that touch your target objects. Agents invoke flows as actions; conflicting flows create unpredictable behavior.
- Permission set review — Agents run as a named credential with specific permissions. Map out exactly what data and actions your agent needs access to — and nothing more.
- Knowledge base audit — For service agents, your knowledge articles are the agent's primary grounding source. Articles need to be accurate, tagged correctly, and not contradictory.
Use case prioritization
Don't try to build every agent at once. Score potential use cases on two dimensions:
- Volume × Handle time — How many times per week does this task happen, and how long does it take a human?
- Complexity score — How many decision points, data sources, and systems does it touch?
Your first Agentforce deployment should target the top-right quadrant: high volume, moderate complexity. Pure FAQ deflection (low complexity) is better served by Einstein Bots. Novel research tasks (high complexity) should wait until your team has Agentforce experience.
Phase 2: Architecture (Week 3)
Agentforce architecture decisions made in week 3 are expensive to undo in week 7. The four decisions that matter most:
1. Agent topology
Will you build one generalist agent or multiple specialist agents? Our recommendation for first deployments: one specialist agent with a tightly defined scope. Generalist agents sound appealing but are harder to test, harder to audit, and more likely to produce unexpected behavior.
2. Action library design
Each action your agent can take needs to be explicitly defined. Actions are invoked by the LLM — if the action description is ambiguous, the agent will invoke it in unexpected situations. Write action descriptions like you're writing an API contract: precise, specific, with clear input/output definitions.
3. Handoff protocol
Define the exact conditions under which the agent transfers to a human. This is not a technical decision — it's a business policy decision. Get sign-off from your operations and compliance teams before you build.
4. Grounding strategy
What data does the agent have access to? We use a principle of minimum sufficient grounding: give the agent access to exactly the data it needs to do its job — no more. This improves response quality, reduces latency, and limits blast radius if something goes wrong.
Phase 3: Build & Test (Weeks 4–7)
We run build sprints in two-week cycles with a structured testing protocol at the end of each sprint.
Sprint 1 (Weeks 4–5): Core actions + happy path
Build the 3–5 most common actions first and test the agent against your 20 most common use cases. At the end of sprint 1, you should have an agent that handles the happy path correctly 90%+ of the time.
Sprint 2 (Weeks 6–7): Edge cases + adversarial testing
This is where most implementations underinvest. Adversarial testing means deliberately trying to break the agent:
- Ask ambiguous questions that could trigger multiple actions
- Submit requests that are close to — but outside — the agent's defined scope
- Test with incomplete or contradictory information
- Attempt prompt injection (asking the agent to ignore its instructions)
For every failure mode you find, either refine the agent's role description, add a guardrail, or add a new handoff condition. Document every failure and your fix — this becomes your regression test suite.
Phase 4: Launch (Week 8)
We never do a big-bang launch. Week 8 is a phased rollout:
- Day 1–2: 5% of traffic routed to agent. Human agents review every interaction in real time.
- Day 3–4: 20% traffic if no critical failures. Review flagged interactions only.
- Day 5: 50% traffic. Begin tracking deflection rate and CSAT in parallel.
- Day 7–8: 100% traffic if metrics are on target. Define your "off ramp" conditions — at what deflection rate or CSAT score you'd reduce traffic back to human agents.
The go-live checklist
- ☐ Agent tested against 100+ real historical cases
- ☐ Handoff conditions reviewed and approved by operations
- ☐ Monitoring dashboard live (deflection rate, CSAT, escalation rate, avg handle time)
- ☐ Rollback plan documented and tested
- ☐ Human agents trained on what the agent does and doesn't handle
- ☐ Compliance and legal sign-off on agent scope
- ☐ Data retention policies confirmed for agent conversation logs
What to expect in week 9 and beyond
An Agentforce deployment is not a project — it's a product. The first 8 weeks get you to production. The next 8 weeks are where you tune the agent based on real production data and start expanding its scope.
Metrics typically stabilize around week 10–12. The performance trajectory we see most often: deflection rate starts around 45–50% at launch, climbs to 60–65% by week 12 as you address the most common failure modes from production data.