CLI: Add site editor performance benchmark#3408
Conversation
Add a Playwright-based performance benchmark for the Playground CLI that measures 5 site editor metrics (siteEditorLoad, templatesViewLoad, templateOpen, blockAdd, templateSave). Adapted from Automattic/studio's tools/benchmark-site-editor. Run via: npx nx perf playground-cli Supports --mode=unbuilt-jspi (default) and --mode=built to test against different CLI targets, --with-plugins for a plugins-loaded variant, and --rounds=N for statistical reliability via median aggregation.
Measure time from spawning the CLI process until the server responds to HTTP requests. Reported alongside the per-round site editor metrics.
Remove section separator comments carried over from the Studio source. Remove --skip-browser from CLI args since the server command doesn't support it (only the start command opens a browser).
Consume fetch response body in waitForServer to prevent undici's per-origin connection pool from being exhausted, which caused the server readiness check to hang even when the server was up. Kill processes by port in stopServer as a fallback, since the actual CLI server is a grandchild of npx and may not share the process group.
The Express server accepts TCP connections before WordPress finishes booting in WASM, causing fetch() to hang indefinitely waiting for response headers. Add AbortSignal.timeout(10s) to each individual request so the retry loop can make progress.
Node.js fetch follows redirects by default. The Playground CLI server redirects / to / for auto-login, creating an infinite redirect loop that causes fetch to fail. Use redirect: 'manual' to see the 302 directly and treat it as server-ready.
Failed rounds are retried up to 2x the requested round count. If not enough successful rounds are collected, the script exits with a non-zero status. Previously, partial failures were silently ignored and included in the results.
The iframe element can be visible in the DOM before Playwright registers it in its internal frame list. Poll page.frame() for up to 30s instead of checking once, which eliminates the intermittent 'Editor canvas frame not found' failures in headed mode.
Use the same node command as the unbuilt-jspi Nx target but invoke it directly with process.execPath. This avoids cross-platform issues with npx on Windows and removes the npx/nx startup overhead from the server startup measurement.
There was a problem hiding this comment.
Pull request overview
Adds a Playwright-driven performance benchmark workflow for the Playground CLI that measures site editor interaction timings and persists results as JSON artifacts.
Changes:
- Introduces a new
nx perf playground-clitarget and a root npm script to run the benchmark. - Adds a Playwright measurement harness for 5 site editor interaction metrics, plus server startup timing.
- Adds a “with plugins” blueprint and artifacts output ignore rules for repeatable runs.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/playground/cli/project.json | Adds an Nx perf target to run the benchmark entrypoint. |
| packages/playground/cli/perf/benchmark.ts | Implements server spawn/teardown, rounds/median aggregation, and JSON/table reporting. |
| packages/playground/cli/perf/measure-site-editor.ts | Implements Playwright steps for site editor navigation and metric collection. |
| packages/playground/cli/perf/plugins-blueprint.json | Adds a plugin-heavy blueprint variant for benchmarking. |
| packages/playground/cli/perf/artifacts/.gitignore | Ensures generated benchmark artifacts aren’t committed. |
| packages/playground/cli/.eslintrc.json | Disables no-console for perf scripts. |
| package.json | Adds a convenience npm script to run the benchmark. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Instead of specifying a port and polling for server readiness, parse the server URL from the CLI's 'Ready\! WordPress is running on ...' output. This avoids port conflicts and removes the killProcessesOnPort cleanup that could kill unrelated processes.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The previous code searched for any button with 'Saved' in its text, which could match unrelated UI elements or never match at all (the current WordPress site editor keeps the text as 'Save' and sets aria-disabled=true when the save completes). Target the exact Save button and wait for it to become disabled.
|
This is pretty separate from everything else. It works when I run it locally, and all the existing tests pass. Now we can use this for better evaluation of specific Windows performance improvements. If we want, we could also post performance output for every PR that touches a Playground CLI dependency. Let's merge once the tests pass again after a small update. |
|
note: I meant to commit with a |
Summary
tools/benchmark-site-editor/unbuilt-jspiandbuiltCLI modes via--modeflag, with configurable rounds and an optional plugins-loaded variantHow it works
The benchmark spawns a Playground CLI server via Nx targets (
npx nx unbuilt-jspi playground-cliornpx nx start playground-cli), then launches headless Chromium to navigate through the WordPress site editor and time each interaction. Results are aggregated using median across rounds and output as both a console table and a JSON artifact.Metrics measured:
serverStartupsiteEditorLoadtemplatesViewLoadtemplateOpenblockAddtemplateSaveUsage:
Test plan
npx nx perf playground-cli -- --rounds=1and verify it starts the CLI, runs measurements, prints a results table (includingserverStartup), and saves a JSON artifact--mode=builtand verify it builds first, then benchmarks the built CLI--with-pluginsand verify both bare and with-plugins environments are benchmarked--headedand verify Chromium launches visibly for debuggingnpx nx lint playground-clipasses (nono-consoleerrors from perf files)🤖 Generated with Claude Code