close
Skip to content

Tests, docs, and strict typing for #4#11

Open
jeherve wants to merge 5 commits intom:fix/array-encoding-and-term-resolutionfrom
jeherve:add/wp-urlencode-tests-and-docs
Open

Tests, docs, and strict typing for #4#11
jeherve wants to merge 5 commits intom:fix/array-encoding-and-term-resolutionfrom
jeherve:add/wp-urlencode-tests-and-docs

Conversation

@jeherve
Copy link
Copy Markdown
Contributor

@jeherve jeherve commented Apr 6, 2026

Note

This PR is open against #4, offering some improvements to that fix

I did a review of #4 before it lands, and instead of writing a long comment I figured it would be clearer to open a PR against your branch with the changes I'd suggest. Five small, independent commits — merge the whole thing, cherry-pick what you like, or reject anything that doesn't fit. Totally up to you, happy to convert it to review comments if you'd rather.

What's in here

  1. Add unit tests for wp_urlencode: it allows us to test specifically against the regression reported in Incorrect array encoding creates junk categories instead of updating post terms #2, and also against the nested-list-of-dicts follow-up in 3bcd2d8. The new tests should cover the failure modes, edge cases, and a regression class that asserts the broken Python-repr form never comes back.
  2. Remove stray taxonomist-prompt-log.md: looks like it got committed by mistake. Nothing in the tool reads it, so I'm dropping the 417 lines of dead prompt log.
  3. Fix misleading wp_urlencode example output: the docstring and the agents/apply.md example both showed the output with literal [ ] brackets, but the real return value URL-encodes them as %5B / %5D. PHP decodes them transparently so the helper works, but anyone reading the comment to verify it would assume the helper was broken. Also documents the scalar-only assumption (None/bool get str()'d rather than omitted).
  4. Reject digit-string cats in apply-changes.php: the Python validator in validate_suggestions is strict about integer cats, but the PHP apply step was lenient and also accepted digit-strings via ctype_digit. That mismatch meant a type-confusion bug in an agent's output could slip through if it only ever hit the PHP side. I flipped the PHP check to match Python so both sides enforce the same invariant.
  5. Reject non-dict input in wp_urlencode: a top-level list or string silently produced malformed output. Added a type check at the entry point plus four tests to lock it in.

What's not in here

There was something else I thought we could improve, but I think it deserves its own PR. I opened #10 to address that.

Happy to fold that change into this PR if necessary. Let me know :)

Test plan

  • python3 -m unittest tests.test_helpers — 75 tests, all green (20 new)
  • php -l lib/apply-changes.php — no syntax errors (no phpcs configured in the repo)
  • Spot-checked the updated curl example in agents/apply.md

jeherve added 5 commits April 6, 2026 18:40
The wp_urlencode helper landed in m#4 without any tests, which left the issue #2 regression unprotected and the nested-list-of-dicts follow-up in 3bcd2d8 unverified. I added coverage for:

- the exact issue #2 input (nested dict with a list value)
- bracket URL-encoding (%5B / %5D vs literal [ ])
- nested lists of dicts, the 3bcd2d8 case
- empty list, empty dict, empty input, None, bool, int, and special characters
- a regression class that asserts the broken Python-repr form never comes back, and that a naive urlencode still demonstrates the original bug
This file got committed by mistake alongside the original fix in 1c59f54. Nothing in the tool reads it and it's not an output of any step, so there's no reason to keep 417 lines of dead prompt log in the repo. Dropping it.
The docstring and the agents/apply.md example both showed the encoded output as `terms[kb_category][]=General&terms[kb_category][]=Settings`, but the real return value has the brackets URL-encoded as %5B / %5D. PHP decodes those transparently so the helper works, but anyone reading the comment to verify it would look at the actual output and assume the helper was broken.

Update both spots to show the real wire form, with a note that PHP decodes the brackets back. Also document the scalar-only assumption while I'm there — None and bool values get passed through str() rather than omitted.
The Python validator in lib/helpers.py is strict about this — validate_suggestions requires "cats" values to be real ints, not digit-strings. But apply-changes.php was lenient and accepted both ints and digit-strings via ctype_digit. So if an agent accidentally emitted "42" instead of 42, the Python side would catch it at aggregation time, but a suggestions file that only ever passed through the PHP apply step would get silently coerced.

Flip the PHP pre-flight check to match Python: is_int only, everything else becomes an invalid_refs error. Once that's in place, the resolution loop lower down can drop its (int) cast too — the pre-flight already guarantees the values are integers.

TAXONOMIST_REMOVE_CATS parsing is unchanged. That one comes from an environment variable, which is a string by construction, so digit-string is the only form it can ever arrive as.
Passing a list, tuple, or scalar to wp_urlencode silently produced malformed output. A top-level list would emit []=value with no key. A string would get iterated character-by-character by the inner flatten() helper. Both of these made it easy to mask a bug at the call site.

Add a type check at the entry point that raises TypeError with a clear message. Four new tests cover list, tuple, str, and None to lock the behavior in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant