Taxonomist

I’m really excited to introduce a project I worked on with various AI agents the other night, which I think represents a new way we might build things in the future.

First, the problem: My WordPress site has 5,600+ posts going back decades, and I had some categories that were old and I didn’t really use anymore, and I wasn’t happy with the structure. Every time I made a new post, it irked me a little, and I had this long-standing itch to go back and clean up all my categories, but I knew it was going to be a slog.

Let me present Taxonomist, a new open-source tool you can run with one copy-and-paste command line that solves this problem. Here’s the idea:

  1. You run this code in your terminal, and it spins up a Claude Code instance that asks you for your URL.
  2. Then it takes that and figures out what type of site you have, which APIs are available, and starts downloading all your posts locally for analysis.
  3. Sub-agents analyze every post against your current categories and thinks about suggesting new ones.
  4. It previews all the changes.
  5. Tries a variety of ways to authenticate against your site and make all the changes.
  6. Logs everything locally, so anything is reversible later.

THIS IS VERY ALPHA. PROBABLY BUGGY. BE CAREFUL WITH IT. PATCHES WELCOME. MAYBE MAKE A BACKUP OF YOUR SITE BEFORE YOU CHANGE IT.

It kind of just worked. I ran it live against ma.tt and it cleaned up a ton of stuff pretty much exactly how I wanted. But there’s a lot of weird stuff happening here, so I don’t know quite what this is yet.

  1. It’s very non-deterministic! There is some pre-written code, and probably could be more, but a lot of the code is generated on the fly by your agent. This creates interesting bugs where people testing with less powerful models had some odd behavior.
  2. I kind of want a directory of these useful AI agents on WordPress.org, but also, there’s something a little strange about trusting a remote shell script to run on your machine.
  3. I tested this with Claude, but there’s no reason Codex couldn’t use the repo in the exact same way, and I’d love to improve the quick start script to start by detecting all the agents you have, asking which you’d like to use, and also which directory you’d like to work in. I think we could kill the cd taxonomist-main && claude "start" part of it.
  4. Because much of the code and commands are generated on the fly from prompts, it’s very resilient! I’ve seen people try it, and it ran into errors with libraries or whatever, but it just figured out how to work around them.
  5. I’d love it if, at the end of every session, there was a moment for self-reflection where the agent would take the repository and suggest upstream issues and PRs based on anything that went wrong. Then this could recursively self-improve very quickly.
  6. There are some obvious improvements to this, for example, doing this for tags. Sometimes it creates too many categories when you might only want 3-5 for your theme.
  7. One fun thing is a bunch of the work of this just uses public WordPress APIs, so you can run it against any site! I like using distributed.blog as a demo. It’ll still do all the fun downloading and analysis and everything, you just won’t be able to make changes.
  8. I now have a local cache of all my WordPress posts I can do other interesting things with, and that’s cool.
  9. The logging and reverting probably still has some bugs in it.
  10. You can riff with it along the way, so for example, it suggested I get rid of my Audrey category because it didn’t have enough posts, and I asked it to look at all the companies on Audrey.co website and categorize any posts that talk about them as Audrey, which created like 50 more.
  11. I want to check the GitHub repo for any updates before it starts, and maybe periodically, because it’s iterating and improving really fast.
  12. It’s not the default but the entire thing is way more pleasant if you run it with skip-permissions. So testing I usually run the one-liner, exit, resume with skip.
  13. You can see some of my prompt history in the Github but I apologize it’s not comprehensive, I also used Gemini and Codex with this and got lots of value from them.

So, not sure what this is, but please check it out, play with it, submit improvements or ideas, and think about what’s next. Might host a Zoom or something to brainstorm.

The final thing I say is that this was a very different process of writing software for me. Instead of staying at the computer the entire time, I found myself going away for a bit, napping and dreaming about the code, coming back with new ideas and riffing on them. Maybe I’ll return to my Uberman polyphasic sleep days? Nap-driven development?

BTW I have lots of thoughts and feedback for Emdash but I thought this was more interesting, will try to get that out later tonight. One preview: TinyMCE is a regression; they should use Gutenberg! We designed it for other CMSes and would be fun to have some common ground to jam on.

15 thoughts on “Taxonomist

    1. Hi,

      What we use to do for years is creating new taxonomies in order to help classified content.

      Some example to explain it:
      1- Travel blog. We create “series” to classify posts by country.
      2- Clothes shop. We create “collections” because “brands” (finally a taxonomy in Woo Core) usually have collections as “Summer 2026” or “Levi’s x Beyoncé”

      From my point of view these examples are not “traditional” categories and helps to classify content in a more accurate way. Prevents from achieving hundreds of categories that get outdated. And because the more we created categories the more it’s difficult to find a post. Too much information makes disinformation.

      What do you think about this?

      Best
      Jairo

  1. Awesome! Let me know when this is ready for the non-coder smarty-pants in this shell of creative spinners of imagination materialization! Respect and Gratitude, ilsa bartlett

  2. That’s interesting. I like the idea of having a place on .org for scripts (or tools) like this. It’s not necessarily be a plugin, but can be anything: bash script, python, PHP, JS, anything that can do (for now) one specific thing. The idea makes me think about the old good days of plugins!

  3. Took a look at Emdash.

    A JavaScript successor for WordPress?

    There have been countless attempts to build a true WordPress alternative, and so far none have succeeded.

    PHP still holds several advantages over JavaScript on the server side, but even using PHP isn’t enough on its own.

    We’ve seen many Laravel based CMS projects claim technical superiority, yet most feel like toys compared to WordPress when it comes to features, extensibility, and overall usability.

    My bet so far for an alternative is Vvveb CMS, written from scratch in vanilla PHP, built around the same philosophy and core ideas that made WordPress great in the first place.

    Hopefully we’ll see some healthy, real competition emerge in this space beyond the AI wave.

    But right now, I don’t see anything coming out of the increasingly complex JavaScript ecosystem that can genuinely compete with WordPress’s philosophy of simplicity.

  4. Interesting implementation, empowering admins make non-breaking changes, something that would otherwise require endless collaboration in Google Sheets. For the non-determinism, I’m wondering if “Skills” approach could offer more reliability? Would be so exciting to see such AI-assisted speed builds at WordCamps.

  5. Now this is awesome, I think this would be extremely useful for legacy websites. I was also wondering if this could be extended to other LLMs, atm it seems tightly coupled to Claude, but yeah… great stuff!

  6. Waiting for the emdash opinion. I dont see it becoming too widely used, but those SEO functions included in core and the ability to edit CPTs without plugins are incredible.

  7. Ive always felt the one place where AI is really shines is to help with parsing and wrangling through data using normal human language chat prompts coupled with generative contextual text.

  8. As you might imagine, I’m itching to try Taxonomist but still feeling a bit angsty about two things:
    – haven’t yet started running any IA desktop stuff (I’m getting ready to, so that would be a good first thing to do with it)
    – I’m not sure I’m comfortable giving it the keys to my blog , can you give me any sense of how tricky it would be to tweak it so it does a dry run? (I’m guessing easy but… not sure and wary about going down AI-ADHD rabbit holes specially on Easter Sunday!)

SHARE YOUR THOUGHTS