UCP Playground

The AI Shopping Agent Simulator

Test AI shopping agents against real UCP-enabled stores. Debug your MCP, REST, A2A & Embedded Commerce Protocol integrations, benchmark models like Claude, ChatGPT, and Gemini, and share session replays with your team and the world.

Tested across , , , and more

Your Agent Debugging Toolkit

Playground

Inspect & debug

Connect to your store's MCP endpoint. Call tools manually, inspect raw JSON responses, and find exactly where your schemas break before an agent ever sees them.

Open Playground →

Agent Benchmark

Test & compare

Run 11 AI models against your store and see where they get stuck. Compare them side by side, then share the replays with your team.

Run Shopping Agent →

Headless API

Automate & integrate

Run agent sessions programmatically via REST. Pipe results into CI/CD, regression suites, or your own dashboards with a single bearer token.

View API Docs →

Built for the Builders

MCP, REST & Embedded

All three transports supported. Connect via JSON-RPC, OpenAPI, or Embedded Commerce Protocol — test whichever your store exposes.

Real Developers, Real Stores

Developers building UCP integrations use the playground to validate schemas, debug tools, and iterate in real time.

Multi-Model A/B Testing

Same store, different model, different outcome. Run up to 5 AI models side by side on the same prompt and compare on the leaderboard.

Schema Iteration Loop

Test → find the gap → fix your server → retest. Developers go from broken schemas to full checkout in a single session. See how it works.

11 Models, 6 Providers

Every session runs through OpenRouter. Pick a model, pick a store, and watch it shop.

Model	Provider	Class
Claude Opus 4.6	Anthropic	Frontier
Claude Sonnet 4.5	Anthropic	Frontier
GPT-5.2	OpenAI	Frontier
GPT-4o	OpenAI	Mid-tier
Gemini 3.1 Pro	Google	Frontier
Gemini 2.5 Pro	Google	Mid-tier
Gemini 2.5 Flash	Google	Fast
Grok 4	xAI	Frontier
Gemini 3 Flash	Google	Fast
DeepSeek R1 Reasoning	DeepSeek	Open-weight
DeepSeek V3.2	DeepSeek	Open-weight
Llama 3.3 70B	Meta	Open-weight

Replay Every Agent Journey

Debug what worked, share what didn't. Embeddable session replays your whole team can review.

Live from the UCP Network

Real products queried from merchant storefronts via their MCP endpoints.

Surf Stoked Midi Dress

Women's Trail Runners

$87.00

MCP Connected

Roxy

Pina To My Colada Straw Panama Hat

$36.00

For Teams

Shared Workspaces for Commerce Teams

One workspace for your whole team. Shared sessions, private audits, headless API access, and usage tracking — built for consultancies and engineering teams working on AI commerce.

Shared Sessions

Every session your team runs is visible to all members. Filter by colleague, model, store, or outcome. No more sharing links manually.

Headless API

Team-scoped API tokens for CI/CD pipelines and automated evals. Run sessions programmatically, collect results, track regressions.

Usage Dashboard

Track sessions, tokens consumed, checkout rates, and per-member activity. Know exactly what your team is spending before you get the bill.

Private Audits

Store audits run by your team stay private by default. Client data never appears on the public leaderboard unless you choose to publish it.

Invite by Email

Add colleagues with a single email. They get a link, join the workspace, and immediately see all shared sessions and audits. No setup friction.

Role-Based Access

Owners manage billing, admins create API tokens and invite members, members run sessions and view results. Clear boundaries, no confusion.

Automated Evals

Benchmark your store on every deploy

Define multi-turn shopping sequences, run them across stores and models, and get PDF reports with funnel comparison and error analysis. Use the web UI, headless API, or cron schedules.

Learn about Evals Create Collection

curlPythonNode.js

# Run a shopping agent session
curl -X POST https://ucpplayground.com/api/v1/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "everlane.com",
    "model": "claude-opus-4-6",
    "message": "Find me a black t-shirt"
  }'

Sample eval report — download full PDF

Funnel Matrix

	Search	Cart	Checkout
Claude Opus 4.6
GPT-5.2
Gemini 3.1 Pro
Grok 4

Performance

75%Checkout rate

12Sessions

18.4kAvg tokens

24.1sAvg duration

Errors Detected

WARNGPT-5.2 — checkout URL missing

FAILGrok 4 — invalid variant ID

FAILGrok 4 — cart creation timeout

Build 10x Faster with UCP Playground

Enter a domain above, or jump straight to one of the tools.

Run Shopping Agent Inspect Agent Tools

How it works·All features·Automated evals·Help center

UCP Playground

Your Agent Debugging Toolkit

Playground

Agent Benchmark

Headless API

Free to use. Sign up to unlock more.

Built for the Builders

MCP, REST & Embedded

Real Developers, Real Stores

Multi-Model A/B Testing

Schema Iteration Loop

11 Models, 6 Providers

Replay Every Agent Journey

Live from the UCP Network

Shared Workspaces for Commerce Teams

Shared Sessions

Headless API

Usage Dashboard

Private Audits

Invite by Email

Role-Based Access

Benchmark your store on every deploy

Build 10x Faster with UCP Playground