-
Notifications
You must be signed in to change notification settings - Fork 7
Incremental Sync refactor #52
Description
Incremental Sync Architecture
Motivation
WP Packages currently runs a full pipeline on every sync cycle: discover packages → fetch updates → build ~140k files to disk → deploy via symlink swap → upload to R2. This worked when Composer v1 required a complete provider tree, but since dropping v1 support, the build directory is vestigial overhead. Every run rewrites all files regardless of whether anything changed, and the R2 sync walks the entire build directory doing byte comparisons — O(total packages) instead of O(changed packages).
Goal
Replace the build-directory pipeline with a DB-driven architecture where SQLite is the single source of truth. Packages get a content_hash (what the data looks like) and a deployed_hash (what's live on R2). Finding what needs uploading becomes a single query: WHERE content_hash != deployed_hash. No intermediate files, no filesystem walking, no manifest.
How It Works
Three-step pipeline: Discover → Update → Sync
- Discover checks what packages exist and which ones changed (via SVN revision log). Cheap — no API calls.
- Update fetches full metadata from wp.org only for changed packages, normalizes versions, and computes
content_hash. If the hash changed, the package is marked dirty. - Sync queries for dirty packages, serializes their Composer JSON, uploads to R2 in parallel, then stamps
deployed_hash. Crash-safe — if interrupted, the next run picks up where it left off.
DB-backed serving for local dev: the HTTP server serializes Composer metadata directly from SQLite on each request, eliminating the build step entirely for development.
Conditional packages.json upload: the root Composer config is effectively static, so it's uploaded with If-None-Match — a no-op on most runs.
Phases
- Schema + Content Hash — Add
content_hash,deployed_hash, andcontent_changed_atcolumns. Extract serialization logic into a purecomposerpackage. Compute hashes at update time. - DB-Backed Serve Layer — Serve
/p2/{type}/{name}.jsonand/packages.jsondirectly from SQLite. Remove thedevcommand in favor of Makefile-composed CLI commands. - R2 Sync — The main cut-over. Replace filesystem-based build + deploy with DB-driven sync. Combine
buildsandsync_runstables into a singlepipeline_runstable. Delete ~1,200 lines of build/deploy/filesystem code. - Test Infrastructure — Update existing integration tests (mock wp.org server and gofakes3 already built) for the new architecture. Add a full round-trip test: seed DB → sync to fake S3 → resolve with Composer.
- Metadata Changes Feed — Packagist-compatible
/metadata/changes.jsonendpoint powered by thecontent_changed_atcolumn, enabling third-party mirrors to poll for updates efficiently.
Phases are sequential — each builds on the previous — but Phase 2 can coexist with the old pipeline (the serve layer reads from DB while the old pipeline still runs), making the transition incremental.