The AI Shopping Agent Simulator
Test AI shopping agents against real UCP-enabled stores. Debug your MCP, REST, A2A & Embedded Commerce Protocol integrations, benchmark models like Claude, ChatGPT, and Gemini, and share session replays with your team and the world.
Inspect & debug
Connect to your store's MCP endpoint. Call tools manually, inspect raw JSON responses, and find exactly where your schemas break before an agent ever sees them.
Open Playground →Test & compare
Run 11 AI models against your store and see where they get stuck. Compare them side by side, then share the replays with your team.
Run Shopping Agent →Automate & integrate
Run agent sessions programmatically via REST. Pipe results into CI/CD, regression suites, or your own dashboards with a single bearer token.
View API Docs →All three transports supported. Connect via JSON-RPC, OpenAPI, or Embedded Commerce Protocol — test whichever your store exposes.
Developers building UCP integrations use the playground to validate schemas, debug tools, and iterate in real time.
Same store, different model, different outcome. Run up to 5 AI models side by side on the same prompt and compare on the leaderboard.
Test → find the gap → fix your server → retest. Developers go from broken schemas to full checkout in a single session. See how it works.
Every session runs through OpenRouter. Pick a model, pick a store, and watch it shop.
| Model | Provider | Class |
|---|---|---|
| Claude Opus 4.6 | Frontier | |
| Claude Sonnet 4.5 | Frontier | |
| GPT-5.2 | Frontier | |
| GPT-4o | Mid-tier | |
| Gemini 3.1 Pro | Frontier | |
| Gemini 2.5 Pro | Mid-tier | |
| Gemini 2.5 Flash | Fast | |
| Grok 4 | Frontier | |
| Gemini 3 Flash | Fast | |
| DeepSeek R1 Reasoning | Open-weight | |
| DeepSeek V3.2 | Open-weight | |
| Llama 3.3 70B | Open-weight |
Debug what worked, share what didn't. Embeddable session replays your whole team can review.
Real products queried from merchant storefronts via their MCP endpoints.
For Teams
One workspace for your whole team. Shared sessions, private audits, headless API access, and usage tracking — built for consultancies and engineering teams working on AI commerce.
Every session your team runs is visible to all members. Filter by colleague, model, store, or outcome. No more sharing links manually.
Team-scoped API tokens for CI/CD pipelines and automated evals. Run sessions programmatically, collect results, track regressions.
Track sessions, tokens consumed, checkout rates, and per-member activity. Know exactly what your team is spending before you get the bill.
Store audits run by your team stay private by default. Client data never appears on the public leaderboard unless you choose to publish it.
Add colleagues with a single email. They get a link, join the workspace, and immediately see all shared sessions and audits. No setup friction.
Owners manage billing, admins create API tokens and invite members, members run sessions and view results. Clear boundaries, no confusion.
Automated Evals
Define multi-turn shopping sequences, run them across stores and models, and get PDF reports with funnel comparison and error analysis. Use the web UI, headless API, or cron schedules.
# Run a shopping agent session curl -X POST https://ucpplayground.com/api/v1/chat \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "domain": "everlane.com", "model": "claude-opus-4-6", "message": "Find me a black t-shirt" }'
| Search | Cart | Checkout | |
|---|---|---|---|
| Claude Opus 4.6 | |||
| GPT-5.2 | |||
| Gemini 3.1 Pro | |||
| Grok 4 |
Enter a domain above, or jump straight to one of the tools.