close

UCP Playground

The AI Shopping Agent Simulator

Test AI shopping agents against real UCP-enabled stores. Debug your MCP, REST, A2A & Embedded Commerce Protocol integrations, benchmark models like Claude, ChatGPT, and Gemini, and share session replays with your team and the world.

everlane.comallbirds.comkyliecosmetics.comroxy.compier1.comforever21.commonos.com
Tested across , , , and more

Built for the Builders

MCP, REST & Embedded

All three transports supported. Connect via JSON-RPC, OpenAPI, or Embedded Commerce Protocol — test whichever your store exposes.

Real Developers, Real Stores

Developers building UCP integrations use the playground to validate schemas, debug tools, and iterate in real time.

Multi-Model A/B Testing

Same store, different model, different outcome. Run up to 5 AI models side by side on the same prompt and compare on the leaderboard.

Schema Iteration Loop

Test → find the gap → fix your server → retest. Developers go from broken schemas to full checkout in a single session. See how it works.

DeepSeek
Anthropic
OpenAI
Google
Meta
xAI
Anthropic
Google
OpenAI
DeepSeek
Meta
xAI
DeepSeek
Anthropic
OpenAI
Google
Meta
xAI
Anthropic
Google
OpenAI
DeepSeek
Meta
xAI
Google
xAI
OpenAI
Anthropic
DeepSeek
Meta
OpenAI
xAI
Google
Anthropic
Meta
DeepSeek
Google
xAI
OpenAI
Anthropic
DeepSeek
Meta
OpenAI
xAI
Google
Anthropic
Meta
DeepSeek
Meta
OpenAI
Google
Anthropic
xAI
DeepSeek
Google
OpenAI
Anthropic
Meta
xAI
DeepSeek
Meta
OpenAI
Google
Anthropic
xAI
DeepSeek
Google
OpenAI
Anthropic
Meta
xAI
DeepSeek

11 Models, 6 Providers

Every session runs through OpenRouter. Pick a model, pick a store, and watch it shop.

ModelProviderClass
Claude Opus 4.6 AnthropicFrontier
Claude Sonnet 4.5 AnthropicFrontier
GPT-5.2 OpenAIFrontier
GPT-4o OpenAIMid-tier
Gemini 3.1 Pro GoogleFrontier
Gemini 2.5 Pro GoogleMid-tier
Gemini 2.5 Flash GoogleFast
Grok 4 xAIFrontier
Gemini 3 Flash GoogleFast
DeepSeek R1 Reasoning DeepSeekOpen-weight
DeepSeek V3.2 DeepSeekOpen-weight
Llama 3.3 70B MetaOpen-weight

Replay Every Agent Journey

Debug what worked, share what didn't. Embeddable session replays your whole team can review.

For Teams

Shared Workspaces for Commerce Teams

One workspace for your whole team. Shared sessions, private audits, headless API access, and usage tracking — built for consultancies and engineering teams working on AI commerce.

Shared Sessions

Every session your team runs is visible to all members. Filter by colleague, model, store, or outcome. No more sharing links manually.

Headless API

Team-scoped API tokens for CI/CD pipelines and automated evals. Run sessions programmatically, collect results, track regressions.

Usage Dashboard

Track sessions, tokens consumed, checkout rates, and per-member activity. Know exactly what your team is spending before you get the bill.

Private Audits

Store audits run by your team stay private by default. Client data never appears on the public leaderboard unless you choose to publish it.

Invite by Email

Add colleagues with a single email. They get a link, join the workspace, and immediately see all shared sessions and audits. No setup friction.

Role-Based Access

Owners manage billing, admins create API tokens and invite members, members run sessions and view results. Clear boundaries, no confusion.

Automated Evals

Benchmark your store on every deploy

Define multi-turn shopping sequences, run them across stores and models, and get PDF reports with funnel comparison and error analysis. Use the web UI, headless API, or cron schedules.

curlPythonNode.js
# Run a shopping agent session
curl -X POST https://ucpplayground.com/api/v1/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "everlane.com",
    "model": "claude-opus-4-6",
    "message": "Find me a black t-shirt"
  }'
Sample eval report — download full PDF
Funnel Matrix
SearchCartCheckout
Claude Opus 4.6
GPT-5.2
Gemini 3.1 Pro
Grok 4
Performance
75%Checkout rate
12Sessions
18.4kAvg tokens
24.1sAvg duration
Errors Detected
WARNGPT-5.2 — checkout URL missing
FAILGrok 4 — invalid variant ID
FAILGrok 4 — cart creation timeout

Build 10x Faster with UCP Playground

Enter a domain above, or jump straight to one of the tools.