close
Skip to content

Fix array encoding for custom taxonomies and use term IDs for resolution#4

Open
m wants to merge 2 commits intomainfrom
fix/array-encoding-and-term-resolution
Open

Fix array encoding for custom taxonomies and use term IDs for resolution#4
m wants to merge 2 commits intomainfrom
fix/array-encoding-and-term-resolution

Conversation

@m
Copy link
Copy Markdown
Owner

@m m commented Mar 31, 2026

Resolves #2

What changed:
The issue was caused by Python's urllib.parse.urlencode stringifying lists in nested dictionaries (e.g., terms[kb_category][]), which caused WordPress to interpret the payload as a single literal string instead of an array of parameters.

To fix this:

  1. Added a new wp_urlencode helper in lib/helpers.py that properly flattens nested dictionaries and lists into the exact query parameter format expected by WordPress and PHP (e.g., key[subkey][]=val).
  2. Updated the AI agent instructions (agents/apply.md, CLAUDE.md, and AGENTS.md) to mandate the use of wp_urlencode when building queries for the WordPress.com API, preventing this bug from reoccurring.
  3. Alongside this, shifted the canonical identifier for applying changes from slug to term_id to prevent taxonomy drift across the entire analysis lifecycle.

m added 2 commits March 31, 2026 10:21
- Add wp_urlencode helper in Python to properly format nested arrays/dicts for WordPress APIs, fixing issue #2.
- Update agent documentation (CLAUDE.md, AGENTS.md, agents/apply.md) to mandate the use of wp_urlencode.
- Switch from using slugs to term_ids as the canonical identifier in exports, analysis, and applying changes to prevent taxonomy drift.
silverstein added a commit to silverstein/taxonomist that referenced this pull request Apr 2, 2026
Implements a tested adapter for the WordPress.com / Jetpack REST API
at lib/adapters/wpcom_adapter.py, matching the WpCliAdapter interface.

Specific defenses against documented bugs:
- #1: delete_category() accepts only int term_id, resolves slug from
  live data before deleting. Raises TypeError on string input.
- m#2: wp_urlencode() uses doseq=True so list values produce repeated
  keys instead of being stringified.
- m#3: update_category() always includes parent in the payload and
  verifies term count before/after to detect silent duplicates.
- m#4: wp_urlencode() used consistently for all request encoding.

23 unit tests with mocked HTTP covering pagination, error handling,
cache invalidation, and all four issue defenses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jeherve added a commit to jeherve/taxonomist that referenced this pull request Apr 6, 2026
The wp_urlencode helper landed in m#4 without any tests, which left the issue #2 regression unprotected and the nested-list-of-dicts follow-up in 3bcd2d8 unverified. I added coverage for:

- the exact issue #2 input (nested dict with a list value)
- bracket URL-encoding (%5B / %5D vs literal [ ])
- nested lists of dicts, the 3bcd2d8 case
- empty list, empty dict, empty input, None, bool, int, and special characters
- a regression class that asserts the broken Python-repr form never comes back, and that a naive urlencode still demonstrates the original bug
Copy link
Copy Markdown
Contributor

@jeherve jeherve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this PR, and ended up creating #11 with some of the recommendations I would have that could make this PR a bit safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect array encoding creates junk categories instead of updating post terms

2 participants