close

Make WordPress Core

Opened 6 weeks ago

Last modified 34 hours ago

#64696 new defect (bug)

Real time collboration effectively disables persistent post caches while anyone edits a post

Reported by: peterwilsoncc's profile peterwilsoncc Owned by:
Milestone: 7.0 Priority: normal
Severity: normal Version: trunk
Component: Posts, Post Types Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description (last modified by peterwilsoncc)

When real time collaboration is enabled, any time the post editor is open the persistent caching for WP_Query is effectively turned off due to the frequent updates to the wp_sync_awareness meta key.

Any time post data is updated, the post's last_changed time up updated in persistent caches to indicate the entire WP_Query (and various other post caches) need to be invalidated.

To reproduce:

  1. Configure a site with a persistent cache
  2. Enable real time collaboration
  3. Without an editor open, run the WP CLI command wp eval "var_dump( wp_cache_get_last_changed( 'posts' ) );" a couple of times, a few seconds apart
  4. Observe that the value does not change between calls.
  5. Edit a post, any post type will work.
  6. run the WP CLI command wp eval "var_dump( wp_cache_get_last_changed( 'posts' ) );" a couple of times, a second apart
  7. Observe that the value does changes every second

As objects cached in the post-queries group use the outcome of wp_cache_get_last_changed( 'posts' ) to salt their caches, this means that leaving the editor open will effectively prevent the caching of post queries throughout the site.

Reducing the frequency the wp_sync_awareness meta data is updated will reduce the severity of the problem, if that's possible.

To an extent this is an inevitable effect whenever the RTC syncing data is stored in the post and post meta tables, ideally it would be limited to only occur when an edit is made so that leaving a browser tab open doesn't have negative implications on the performance of a site.

I've put this on the 7.0 milestone for the purposes of investigating the issue to see if mitigation is option.

Attachments (1)

64696-late-meta-flush.diff (2.1 KB) - added by peterwilsoncc 2 weeks ago.

Download all attachments as: .zip

Change History (149)

#1 follow-up: Image @dd32
6 weeks ago

For discussion purposes, How much does this differ from the previous behaviour?

I believe the previous editor / post-locking heartbeat also resulted in a post edit, but I recall that being every 30-60 seconds instead of almost every second.

#2 Image @westonruter
6 weeks ago

Oh, that's not good. It seems this is due to WP_Sync_Post_Meta_Storage::add_update() calling add_post_meta() which ends up triggering a added_post_meta action, and in default-filters.php the wp_cache_set_posts_last_changed() function is hooked to run when added_post_meta fires.

Similarly, when WP_Sync_Post_Meta_Storage::set_awareness_state() is called, it calls update_post_meta() which will end up triggering the updated_post_meta action, and wp_cache_set_posts_last_changed() also runs when that happens.

To work around this, we could temporarily unhook these when the post meta is added or updated:

add_action( 'added_post_meta', 'wp_cache_set_posts_last_changed' );
add_action( 'updated_post_meta', 'wp_cache_set_posts_last_changed' );

This presumes that changes to this postmeta will not impact any related caches (which I presume they wouldn't).

#3 in reply to: ↑ 1 ; follow-up: Image @peterwilsoncc
6 weeks ago

Replying to dd32:

For discussion purposes, How much does this differ from the previous behaviour?

I believe the previous editor / post-locking heartbeat also resulted in a post edit, but I recall that being every 30-60 seconds instead of almost every second.

Heartbeat:

  • fires every ten seconds
  • throttles to two minutes once the browser loses focus, via the visibility API
  • throttles to two minutes if the browser has focus but there's no mouse/keyboard activity for five minutes
  • suspends itself after ten minutes if suspension is enabled; suspends itself after sixty minutes if suspension is disabled

RTC:

  • fires once per second without collaborators active
  • fires four times per second with collaborators active
  • will back off due to network errors, once a network request succeeds it will go back up to full speed
  • doesn't slow down if the browser window loses focus
  • I can't find any code indicating a slow down due to a lack of activity, I've had a browser window open in the background for around ten minutes while typing this and it's still firing each second

Replying to westonruter:

To work around this, we could temporarily unhook these when the post meta is added or updated:

That will end up breaking the cache for post queries using meta queries, eg WP_Query( [ 'meta_key' => 'looney_tunes', 'meta_value' => 'tom and jerry' ] );.

Last edited 6 weeks ago by peterwilsoncc (previous) (diff)

Image

This ticket was mentioned in PR #11002 on WordPress/wordpress-develop by @westonruter.


6 weeks ago
#4

  • Keywords has-patch added

Trac ticket: https://core.trac.wordpress.org/ticket/64696

## Use of AI Tools

None

#5 in reply to: ↑ 3 ; follow-up: Image @westonruter
6 weeks ago

Replying to peterwilsoncc:

Replying to westonruter:

To work around this, we could temporarily unhook these when the post meta is added or updated:

That will end up breaking the cache for post queries using meta queries, eg WP_Query( [ 'meta_key' => 'looney_tunes', 'meta_value' => 'tom and jerry' ] );.

I don't mean for all post meta additions and updates. I mean specifically in WP_Sync_Post_Meta_Storage::set_awareness_state() and WP_Sync_Post_Meta_Storage::add_update().

Here's what I have in mind: https://github.com/WordPress/wordpress-develop/pull/11002

The only post meta involved would be wp_sync_awareness and wp_sync_update. I don't believe there are any queries introduced which involve querying by these post meta.

Image

@czarate commented on PR #11002:


6 weeks ago
#6

I can't speak to whether this is the correct performance mitigation, but the code change looks good in the context of the sync provider. 👍 Thank you!

#7 follow-up: Image @mindctrl
6 weeks ago

I'll have to measure before and after to confirm, but a cursory look at Weston's PR seems to reduce meta db ops by a meaningful amount.

  • doesn't slow down if the browser window loses focus
  • I can't find any code indicating a slow down due to a lack of activity, I've had a browser window open in the background for around ten minutes while typing this and it's still firing each second

Probably should be a separate ticket, but I can confirm this behavior. It would be nice to include some throttling detection for when browsers lose focus and/or don't have typing/activity in some period of time. I left a couple clients open for ~7 hours for testing and this caused over 230k post meta writes.

#8 in reply to: ↑ 7 Image @czarate
6 weeks ago

Replying to mindctrl:

Probably should be a separate ticket, but I can confirm this behavior. It would be nice to include some throttling detection for when browsers lose focus and/or don't have typing/activity in some period of time. I left a couple clients open for ~7 hours for testing and this caused over 230k post meta writes.

I created this Gutenberg issue since polling intervals are controlled client-side.

https://github.com/WordPress/gutenberg/issues/75829

Image

@mindctrl commented on PR #11002:


6 weeks ago
#9

I'll try to confirm the level of accuracy of the screenshots included here to be sure there are no bugs in the way I'm tracking this, but here's a quick look at the difference before and after this PR:

## Before
https://github.com/user-attachments/assets/523a411b-4a77-482e-ab53-95a497fb475c

## After
https://github.com/user-attachments/assets/a30bd562-94bc-4ebd-8976-ac78500ea8c5

Image

@westonruter commented on PR #11002:


6 weeks ago
#10

@mindctrl it might be helpful to also have a client that is continuously doing reads to a function that uses wp_cache_get_last_changed( 'posts' ) as the salt for a return value that should be cached in object cache, to simulate the impact of having this on a live site getting a lot of traffic. This would be in a addition to having sync clients open. For example, get_archives() would continuously have its cache impacted by sync changes.

It seems like there are a lot of examples where unrelated post and post meta changes are causing object cache invalidations needlessly. (Cache invalidation is hard!)

#11 in reply to: ↑ 5 ; follow-up: Image @peterwilsoncc
6 weeks ago

Replying to westonruter:

I don't mean for all post meta additions and updates. I mean specifically in WP_Sync_Post_Meta_Storage::set_awareness_state() and WP_Sync_Post_Meta_Storage::add_update().

Here's what I have in mind: https://github.com/WordPress/wordpress-develop/pull/11002

The only post meta involved would be wp_sync_awareness and wp_sync_update. I don't believe there are any queries introduced which involve querying by these post meta.

Yep that makes more sense but I'd be hesitant doing so for a couple of reasons:

  • it adds a bug trap for our future selves, ie WordPress contributors, due to the snowflake code
  • it blocks third party developers from using the proper APIs to monitor usage

Initially I think it would be better to investigate back-off in the Gutenberg repo and monitor how that goes before messing with caching algorithms

#12 in reply to: ↑ 11 ; follow-up: Image @westonruter
6 weeks ago

Replying to peterwilsoncc:

Initially I think it would be better to investigate back-off in the Gutenberg repo and monitor how that goes before messing with caching algorithms

That too, but even with back-off we definitely don't want syncing logic to have any impact on the frontend. Any changes to a wp_sync_storage post type or any of its post meta should be isolated from impacting any frontend queries since they are totally unrelated. The only thing which should impact the frontend is when the post being modified actually gets an update.

#13 Image @peterwilsoncc
6 weeks ago

I've requested an assist on GB#75829 as the PR I have GB PR#75843 works find until embeds are added to the post.

#14 in reply to: ↑ 12 ; follow-up: Image @peterwilsoncc
6 weeks ago

Replying to westonruter:

...even with back-off we definitely don't want syncing logic to have any impact on the frontend.

As Dion indicates above, with the heartbeat API there is some amount of effect on the front end while editing. The same is true for revisions, autosaves and probably a few more things.

There may be a solution for this by changing how we salt the cache for WP_Query to reduce invalidation to a subset of queries. A problem for another ticket.

Under no circumstances do I think the solution is to deliberately store out of date date in the cache. To do so would replace one bug with another.

#15 in reply to: ↑ 14 Image @westonruter
6 weeks ago

Replying to peterwilsoncc:

Replying to westonruter:

...even with back-off we definitely don't want syncing logic to have any impact on the frontend.

As Dion indicates above, with the heartbeat API there is some amount of effect on the front end while editing. The same is true for revisions, autosaves and probably a few more things.

True, but the problem just seems to be exponentially worse now since the cache invalidations can happen 4 times a second for every user syncing changes. In contrast, the most frequent heartbeat interval is once every 15 seconds, but an autosave only happens once every 60. So while a user is typing continuously with realtime collaboration enabled, this can result in 240 invalidations of posts in the object cache per minute, whereas with autosave heartbeat it only happens once a minute. On a highly dynamic traffic'ed site without page caching, but having persistent object caching to deal with scaling issues, this can result in the object cache essentially being disabled and the database load being increased substantially.

I don't necessarily like my workaround for this issue and I would like there to be a deeper solution to fix the issue of cached posts in the object cache being invalidated needlessly. But I'm also mindful that we're now in beta and this seems like a much bigger performance problem to solve for the general case which may be more appropriate for 7.1.

Image

This ticket was mentioned in Slack in #core by johnparris. View the logs.


6 weeks ago

Image

This ticket was mentioned in Slack in #hosting by johnparris. View the logs.


6 weeks ago

Image

This ticket was mentioned in PR #11067 on WordPress/wordpress-develop by @mindctrl.


5 weeks ago
#18

  • Keywords has-unit-tests added

This is a proof of concept for moving awareness state out of wp_postmeta to cache, using the Transients API, which will use object cache if available and fallback to wp_options if not.

I'm looking for feedback if this is a viable idea in general. If so, I can iterate on this to get it in better shape.

Trac ticket: https://core.trac.wordpress.org/ticket/64696

## Use of AI Tools

I used Claude Opus 4.6 to help with this, mostly for the phpunit test.

Image

This ticket was mentioned in PR #11068 on WordPress/wordpress-develop by @JoeFusco.


5 weeks ago
#19

The real-time collaboration sync layer currently stores messages as post meta, which works but creates side effects at scale. This moves it to a dedicated wp_sync_updates table purpose-built for the workload.

The beta1 implementation stores sync messages as post meta on a private wp_sync_storage post type. Post meta is designed for static key-value data, not high-frequency transient message passing. This mismatch causes:

  1. Cache thrashing — Every sync write triggers wp_cache_set_posts_last_changed(), invalidating site-wide post query caches unrelated to collaboration.
  1. Compaction race condition — The "delete all, re-add some" pattern in remove_updates_before_cursor() loses messages under concurrent writes. The window between delete_post_meta() and the add_post_meta() loop is unprotected.
  1. Cursor race condition — Timestamp-based cursors (microtime() * 1000) miss updates when two writes land within the same millisecond.

A purpose-built table with auto-increment IDs eliminates all three at the root: no post meta hooks fire, compaction is a single atomic DELETE, and auto-increment IDs guarantee unique ordering. The WP_Sync_Storage interface and WP_HTTP_Polling_Sync_Server are unchanged.

Also adds a wp_sync_storage filter so hosts can substitute alternative backends (Redis, WebSocket) without patching core, and includes a beta1 upgrade path that cleans up orphaned wp_sync_storage posts.

### Credits

Trac ticket: https://core.trac.wordpress.org/ticket/64696

## Use of AI Tools

Co-authored with Claude Code (Opus 4.6), used to synthesize discussion across related tickets and PRs into a single implementation. All code was reviewed and tested before submission.

Image

@peterwilsoncc commented on PR #11067:


5 weeks ago
#20

@mindctrl The transients API is designed in such a way that persistence isn't guaranteed. Data can be removed at any time, either by the cache being flushed or the transients been cleaned up.

Sites with multiple servers may use a shared cache or they may use a per server cache, for example, but even for single site servers they may be dropped during a deploy.

As it's inherently unreliable, I don't think this is an approach we can take.

#21 Image @JoeFusco
5 weeks ago

I opened a PR which moves sync storage out of post meta entirely into a dedicated table. Details can be found on GitHub.

https://github.com/WordPress/wordpress-develop/pull/11068

Image

@JoeFusco commented on PR #11067:


5 weeks ago
#22

Hey @mindctrl, I opened #11068 which builds on this same direction. Moved all sync storage to a dedicated table, with awareness in transients like your approach here.

@peterwilsoncc's point about transient reliability is worth discussing — awareness data is transient by nature (cursor positions, selections) and repopulates on the next poll, so a cache flush just means a brief flicker rather than data loss. Happy to explore alternatives if that's still a concern.

#23 Image @peterwilsoncc
5 weeks ago

WordPress/Gutenberg@0013814ce has been merged to throttle syncing when the browser tab is in active: ie, when the screen saver is active or the user has switched to another tab.

I initially attempted to use document.hasFocus() to slow down too but that became super flaky with embeds as it's not possible to determine if a cross origin iframe has focus.

Image

@mindctrl commented on PR #11067:


5 weeks ago
#24

@peterwilsoncc thanks for reviewing. To be clear, I don't love this idea, but given how late we are in the cycle, I wanted to try to reduce the amount of postmeta db operations without refactoring a lot of code. I'd much prefer a custom table.

Awareness syncing is the biggest "offender" here, because even when no content is changing and clients are idle, it's still updating the updated_at value for each client.

Since this is awareness state and not content data, it wouldn't be a lossy, destructive thing if the cache were cleared temporarily before the next awareness state sync, which would happen less than 1 second later. It would just appear that someone dropped offline very briefly.

Image

@JoeFusco commented on PR #11068:


5 weeks ago
#25

@peterwilsoncc Thanks for the thorough review — please don't apologize, this is exactly the kind of institutional knowledge that's invaluable.

As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:

Completely agree. I believe I've audited every code path that reads wp_enable_real_time_collaboration to map out what actually needs protection.

  • The option_{$option} hook

When the setting is saved, the flow through options.php → update_option() writes to the wp_options table. No code is hooked onto that option change that would interact with the sync_updates table, so toggling the setting is safe regardless of table state.

  • Before displaying the options field on the writing page

The checkbox in options-writing.php calls get_option() to check/uncheck itself — it doesn't query the sync_updates table, so it renders safely whether or not the upgrade has run.

The code path that touches the table is the REST route registration in rest-api.php:433, which instantiates WP_Sync_Table_Storage. That's already gated with a db_version >= 61697 check (c13f1cd), so the table is never accessed on sites that haven't run the upgrade.

Some form of filtering would need to be available for sites that are using custom data stores.

The wp_sync_storage filter (line 443) should cover this — it lets sites swap out WP_Sync_Table_Storage for any custom WP_Sync_Storage implementation (Redis, etc.).

Let me know if I've missed anything or if you think additional safeguards would be worthwhile!

Image

@westonruter commented on PR #11068:


5 weeks ago
#26

@peterwilsoncc:

As not all sites run the database upgrade routine on a regular basis (I think wordpress.org is often behind) as it can be quite a burden on a site, if this approach is to be taken it would need to include a check for the table's existence in a couple of locations:

  • The option_{$option} hook
  • Before displaying the options field on the writing page

Some form of filtering would need to be available for sites that are using custom data stores.

Yeah, this has been an issue, but I think it's primarily/only really an issue for database tables that are used for frontend features. Given that the proposed wp_sync_updates table here is exclusively used in the post editor which is in the admin, and access to the admin requires first doing a DB upgrade screen, then I think this wouldn't be a concern.

I know Matt has been reluctant in the past to add new tables to the database schema. I guess (and Matt will be able to correct me) that's to avoid ending up with a database scheme of dozens and dozens of tables.

I haven't touched custom tables for a long time, so I'm not aware of the state of the art here. But I do know that WooCommerce made the switch to using custom tables instead of postmeta for the sake of scalabilitiy, simplicity, and reliability. Doing a quick search in WPDirectory and I see that Jetpack, Yoast, Elementor, WPForms, WordFence, and many others use custom tables as well.

#28 Image @JoeFusco
5 weeks ago

Thanks everyone for the reviews and discussion here. The feedback has been clear and easy to act on.

Worth noting that the dedicated table approach also benefits hosts running alternative transports like WebSockets. Beta1's post meta storage caused cache trashing regardless of how updates were delivered. Moving to a purpose-built table with the wp_sync_storage filter means any transport gets clean storage with zero cache side effects, rather than having to work around post meta internally.

@peterwilsoncc's visibility throttling in the client complements this nicely too - it reduces request frequency on the client side while the table change makes each request cheaper on the server side. Both fixes stack.

The PR includes full test coverage but would of course love more eyes testing this across different hosting setups.

Image

@peterwilsoncc commented on PR #11068:


5 weeks ago
#29

Yeah, this has been an issue, but I think it's primarily/only really an issue for database tables that are used for frontend features. Given that the proposed wp_sync_updates table here is exclusively used in the post editor which is in the admin, and access to the admin requires first doing a DB upgrade screen, then I think this wouldn't be a concern.

This isn't quite correct, on multi site installs site (ie, non super admins) don't get prompted to upgrade the database tables and globally upgrading tables can be prevented using a filter:

add_filter( 'map_meta_cap', function( $caps, $cap ) {
        if ( 'upgrade_network' === $cap ) {
                $caps[] = 'do_not_allow';
        }
        return $caps;
}, 10, 4 );

To reproduce:

  1. Install MS
  2. Create a sub site
  3. Add a user to the network
  4. Add the new user as an admin on the sub site
  5. Login as sub site admin
  6. Bump DB version in version.php
  7. Visit dashboard as sub site admin
  8. Observe lack of db upgrade prompt
  9. Login as super admin
  10. Observe but don't click db upgrade prompt
  11. You can ignore the prompt as long as you wish as a super admin
  12. Add the code above as a mu-plugin
  13. Prompt disappears
  14. Visit /wp-admin/network/upgrade.php
  15. Observe permission denied message
  16. Remove mu-plugin added above to avoid confusion later.

Keep in mind that the sub site admin can enable site RTC on the writing page, even though they can't update the tables.

I haven't touched custom tables for a long time, so I'm not aware of the state of the art here. But I do know that WooCommerce made the switch to using custom tables instead of postmeta for the sake of scalabilitiy, simplicity, and reliability. Doing a quick search in WPDirectory and I see that Jetpack, Yoast, Elementor, WPForms, WordFence, and many others use custom tables as well.

Not really relevant for the core schema.

Image

@westonruter commented on PR #11068:


5 weeks ago
#30

Not really relevant for the core schema.

Seems relevant to me, as plugins may opt to introduce new tables if the core schema doesn't provide an efficient way to store the desired data. In the same way, if a new feature for core can't be represented efficiently in the core schema, then so too should a new table be considered.

Image

@peterwilsoncc commented on PR #11068:


5 weeks ago
#31

@westonruter Well, no, because my point wasn't about whether the tables would be more performant (I said it's probably a good idea in my comment), my point was that Matt has a reluctance to add more tables to the database schema. Whether that reluctance still remains is worth asking but if it does then the plugin actions aren't relevant to core.

Image

This ticket was mentioned in PR #11119 on WordPress/wordpress-develop by @peterwilsoncc.


5 weeks ago
#32

Trac ticket: https://core.trac.wordpress.org/ticket/64696

## Use of AI Tools

Nil. Nada. Null.

#33 Image @peterwilsoncc
5 weeks ago

I've created PR#11119 for moving the awareness state to transients as a much simplified approach:

  • WP_Query cache accuracy is retained
  • transient cache key uses the post ID rather than the room ID, the existing post meta is single use only (ie, only one item of that name is stored per post) so I think the post ID is fine
  • I've set the transient expiry to one hour as a pretty arbitrary number.

This reduces the WP_Query cache invalidation back to the following:

  • actual edits are made to a document
  • heartbeat triggers

#34 Image @matt
5 weeks ago

I'm generally against new tables, but this is a useful primitive for all our future real-time collab and sync work. I like the plain room name instead of hash, which also makes LIKE possible. Hopefully we can leverage this for Playground sync in the future as well.

Image

@peterwilsoncc commented on PR #11068:


5 weeks ago
#35

@westonruter @josephfusco As mentioned in Slack and on the ticket, Matt has approved an additional table for RTC.

As this is a first design and wasn't discussed architecturally, I would like to slow right down and consider the architecture on the ticket prior to proceeding here. As you'll have seen, I've posted in slack but will comment on the ticket once I've thought about things further.

Prior to the discussion, I think it's best if this PR is put on hold so we can focus on any other RTC issues in the meantime. I think there's a few things in the JavaScript client and the polling class that need work in the mean time.

#36 Image @peterwilsoncc
5 weeks ago

As Matt has authorized a new table for RTC to resolve the performance impacts, we (contributors working on this ticket) can move on to an architectural discussion.

Primarily, we need to figure out the table and index structure:

  • WP_HTTP_Polling_Sync_Server has been written for a future in which additional object types have real time collaboration
  • Core objects are posts, comments, terms and users. Posts include post types as sub objects
  • Plugins frequently add their own object, eg WooCommerce has orders; email plugins have email campaigns, etc
  • What data types are needed for various aspects of the table
  • What data currently stored in the meta_value blob of data ought to be stored in it's own column

Miscellaneous questions for consideration:

  • Should awareness be managed via the table or just syncing?
  • Related to the above, on the product call (I couldn't attend), I understand presence indicators were discussed for other pages, eg list tables. Is this something that can be stored here too?
  • MS requires manual intervention to upgrade tables, how best to manage RTC prior to these being added
  • What is the permissions model: is it based on the object being collaborated on, new meta caps, some combination of the two?

Image

@czarate commented on PR #11067:


5 weeks ago
#37

@mindctrl @westonruter Even as we explore a new table as our path forward, resolving the race condition in get_updates_after_cursor is extremely valuable to increase confidence in beta testing. Do you think we can merge this for beta 3?

Image

This ticket was mentioned in Slack in #core by amykamala. View the logs.


5 weeks ago

Image

This ticket was mentioned in Slack in #hosting by amykamala. View the logs.


5 weeks ago

Image

@peterwilsoncc commented on PR #11068:


5 weeks ago
#40

As I mentioned yesterday, I'd strongly prefer, and think it essential, the new table structure be discussed on the ticket rather than on this pull request. Basing the architecture discussion on code leads to a code first decision but it's essential we make an architecture first decision.

We have once chance to get this right, following best practices will make this far more likely to happen.

Image

@westonruter commented on PR #11068:


5 weeks ago
#41

We have once chance to get this right, following best practices will make this far more likely to happen.

Is the concern so grave? The data here is ephemeral. If a WordPress upgrade requires a table schema change and the existing sync updates are dropped, this doesn't seem like such a big deal to me.

Image

@peterwilsoncc commented on PR #11068:


5 weeks ago
#42

Is the concern so grave? The data here is ephemeral. If a WordPress upgrade requires a table schema change and the existing sync updates are dropped, this doesn't seem like such a big deal to me.

dbDelta() runs prior to the upgrade routine so there is no opportunity to truncate before altering the table. The table will have A LOT of data very quickly so any changes will lock the table for a while (I ended up with about 100 rows of posts, post_meta editing hello world for about five minutes).

#43 follow-up: Image @peterwilsoncc
5 weeks ago

Here are my thoughts after Chris caught me up on the background and technical requirements of RTC.

---

Items to move to the table:

  • wp_sync_update post meta for rooms
  • Awareness

Remaining where they currently are:

  • crdt document - it's only updated when a user saves, not when a sync occurs

Table name: wp_collaboration

Columns:

  • ID (BIGINT, Primary key) -- Used as cursor in Post Meta storage class
  • room_id (VARCHAR, 60, Indexed) - hash, with a bit of room for plugin authors to do things
  • event_type (VARCHAR, 255, Indexed) -- "awareness" OR "sync_update" OR future type
  • gmt_timestamp (date/time, Indexed) - when the event happened, this may be updated for items such as the awareness api when it updates on an ongoing basis
  • event_data (LONGTEXT) -- blob, client id, etc as appropriate

WP_Sync_Post_Meta_Storage to be renamed as appropriate for the new dedicated table.

The WP_Sync_Storage interface to be updated to allow for awareness to be stored in multiple rows:

public function get_awareness_states_since( string $room, int $timestamp ): array;
public function add_awareness_state( string $room, array $awareness_state ): void;
public function remove_awareness_states_before( string $room, int $timestamp ): array;

Props @chriszarate for giving this an edit and brainstorming prior to publication.

#44 in reply to: ↑ 43 Image @JoeFusco
4 weeks ago

The pull request we've been working on renames everything from “sync” to “collaboration” - classes, REST routes, table name. A deprecated wp-sync/v1 route alias sticks around for backward compatibility with the Gutenberg plugin during its transition to wp-collaboration/v1.

Here is how the pull request lines up with the proposed schema:


Items moved to the table:

  • wp_sync_update post meta - done, sync updates go into wp_collaboration rows instead of post meta. This gets rid of the cache invalidation noise from wp_cache_set_posts_last_changed() firing on every write.

Remaining where they currently are:

  • CRDT document - agreed, it’s only written on save, not on sync. The pull request doesn’t touch that path.
  • Awareness - stored in transients rather than in the table. Awareness data (cursor positions, selections) is high-frequency and short-lived - every active user writes it every few seconds, and entries go stale after 30 seconds. Transients handle this well: 60 second expiry, automatic cleanup when someone closes their tab. Moving awareness into the table and adding an event_type column is worth exploring, but it adds write volume and index considerations we’d want to benchmark first. Proposing deferment for 7.1.

Table name: wp_collaboration - matches. Registered in wpdb as $wpdb->collaboration.

Columns:

  • id (BIGINT, unsigned, auto-increment, Primary key) - cursor for polling. Auto-increment instead of gmt_timestamp because the beta 1-3 post meta implementation uses millisecond timestamps as message IDs. When two editors send updates in the same millisecond, they get the same ID and one is silently dropped. Auto-increment can’t collide, so that class of bug goes away.
  • room (VARCHAR, 255) - room identifier, unhashed since room strings like postType/post:42 are short
  • update_value (LONGTEXT) - JSON-encoded update payload
  • created_at (DATETIME) - timestamp for the 7-day cleanup cron (matches autodrafts). Expanding its use (admin UI, audit queries) fits naturally in 7.1.

Composite index KEY room (room, id) - every hot query (idle poll, catch-up, compaction) filters by room first, then scans by id. Without this index those queries hit the full table. Worth calling out since it’s not in the proposed schema but all three operations depend on it.

WP_Sync_Storage renamed to WP_Collaboration_Storage. Awareness methods are get_awareness_state() and set_awareness_state().

The PR includes a runnable proof of the beta 1-3 data loss bug and a performance benchmark covering idle poll, catch-up, and compaction from 100 to 100,000 rows.

Pull request description has been updated with the latest commands for testing:
https://github.com/WordPress/wordpress-develop/pull/11068

Props @mindctrl for flagging the timestamp collision risk and the importance of append-only writes.

#45 Image @peterwilsoncc
4 weeks ago

@JoeFusco There's a race condition in the current awareness implementation that remains with the switch to transients: as the awareness stored in a single blob (code ref), near simultaneous requests can drop client IDs from the room.

Chris and I were thinking that putting each client's ID in the new table as a separate row will allow us to avoid that condition. With four request per second when multiple collaborators are around, we have a lot of near simultaneous requests.

#46 follow-up: Image @mindctrl
4 weeks ago

Putting each client's awareness in its own row makes sense. What I don't understand with the current iteration is why we want client_id and type in the update_value blob. They're both necessary for filtering results, which we currently do by decoding every update_value blob and comparing on type and client_id in PHP.

Image

@westonruter commented on PR #11068:


4 weeks ago
#47

@josephfusco It looks like the types can be made more specific. For example:

  • src/wp-includes/collaboration/class-wp-collaboration-table-storage.php

    diff --git a/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php b/src/wp-includes/collaboration/class-wp-collaboration-table-storage.php
    index 894777e608..8c62cc1ab0 100644
    a b  
    1515 * @since 7.0.0
    1616 *
    1717 * @access private
     18 *
     19 * @phpstan-import-type AwarenessState from WP_Collaboration_Storage
    1820 */
    1921class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage {
    2022        /**
    class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage { 
    7375         *
    7476         * @param string $room    Room identifier.
    7577         * @param int    $timeout Seconds before an awareness entry is considered expired.
    76          * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
     78         * @return array<int, array> Awareness entries.
     79         * @phpstan-return array<int, AwarenessState>
    7780         */
    7881        public function get_awareness_state( string $room, int $timeout = 30 ): array {
    7982                global $wpdb;
    class WP_Collaboration_Table_Storage implements WP_Collaboration_Storage { 
    241244         *
    242245         * @global wpdb $wpdb WordPress database abstraction object.
    243246         *
    244          * @param string $room       Room identifier.
    245          * @param int    $client_id  Client identifier.
    246          * @param array  $state      Serializable awareness state for this client.
    247          * @param int    $wp_user_id WordPress user ID that owns this client.
     247         * @param string               $room       Room identifier.
     248         * @param int                  $client_id  Client identifier.
     249         * @param array<string, mixed> $state      Serializable awareness state for this client.
     250         * @param int                  $wp_user_id WordPress user ID that owns this client.
    248251         * @return bool True on success, false on failure.
    249252         */
    250253        public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool {
  • src/wp-includes/collaboration/interface-wp-collaboration-storage.php

    diff --git a/src/wp-includes/collaboration/interface-wp-collaboration-storage.php b/src/wp-includes/collaboration/interface-wp-collaboration-storage.php
    index 9550384da5..5223e89a63 100644
    a b  
    1010 * Interface for collaboration storage backends used by the collaborative editing server.
    1111 *
    1212 * @since 7.0.0
     13 *
     14 * @phpstan-type AwarenessState array{client_id: int, state: mixed, wp_user_id: int}
    1315 */
    1416interface WP_Collaboration_Storage {
    1517        /**
    interface WP_Collaboration_Storage { 
    3234         *
    3335         * @param string $room    Room identifier.
    3436         * @param int    $timeout Seconds before an awareness entry is considered expired.
    35          * @return array<int, array{client_id: int, state: mixed, wp_user_id: int}> Awareness entries.
     37         * @return array<int, array> Awareness entries.
     38         * @phpstan-return array<int, AwarenessState>
    3639         */
    3740        public function get_awareness_state( string $room, int $timeout = 30 ): array;
    3841
    interface WP_Collaboration_Storage { 
    8689         *
    8790         * @since 7.0.0
    8891         *
    89          * @param string $room       Room identifier.
    90          * @param int    $client_id  Client identifier.
    91          * @param array  $state      Serializable awareness state for this client.
    92          * @param int    $wp_user_id WordPress user ID that owns this client.
     92         * @param string               $room       Room identifier.
     93         * @param int                  $client_id  Client identifier.
     94         * @param array<string, mixed> $state      Serializable awareness state for this client.
     95         * @param int                  $wp_user_id WordPress user ID that owns this client.
    9396         * @return bool True on success, false on failure.
    9497         */
    9598        public function set_awareness_state( string $room, int $client_id, array $state, int $wp_user_id ): bool;

#48 in reply to: ↑ 46 Image @peterwilsoncc
4 weeks ago

Replying to mindctrl:

Putting each client's awareness in its own row makes sense. What I don't understand with the current iteration is why we want client_id and type in the update_value blob. They're both necessary for filtering results, which we currently do by decoding every update_value blob and comparing on type and client_id in PHP.

I agree, this is why I was proposing the type row in comment 43.

Image

@peterwilsoncc commented on PR #11068:


4 weeks ago
#49

I am running out of ways to ask politely, can we please discuss and agree on the architecture of the new table _on the trac ticket_ before continuing to work on code.

As I said last week, discussing architecture on a pull request leads to a code first decision whereas discussing the architecture on the ticket focuses the architecture alone, without the influence of a first draft pull request.

To be 100% clear: any work on code without an agreed plan forward is a waste of everyones time and simply delays the work. I am closing this PR so discussion can continue on the ticket.

#50 Image @czarate
4 weeks ago

Putting each client's awareness in its own row makes sense. What I don't understand with the current iteration is why we want client_id and type in the update_value blob. They're both necessary for filtering results, which we currently do by decoding every update_value blob and comparing on type and client_id in PHP.

Both client_id and type are internal implementation details that are specific to Yjs and the polling provider. They could change in the future. Keeping the storage mechanism opaque to these implementation details greatly reduces the risk of breaking changes.

#51 follow-up: Image @mindctrl
4 weeks ago

Replying to peterwilsoncc:

I've been thinking about the schema proposal and whether a single table with an event_type column is the right approach, or whether awareness and content updates are different enough to warrant separate storage.

The access patterns and lifecycles are different:

  • Content updates are append-only messages that must be reliably delivered to every client exactly once. They're ordered by cursor (auto-increment ID), compacted periodically, and may persist for minutes to hours during an editing session. Reliable delivery matters, as a missed or duplicated update means a diverged document.
  • Awareness state is per-client, last-write-wins type data (cursor position, selection, user info). It's overwritten on every poll cycle (~250ms–1s), expires after a short period of inactivity, and losing it briefly just means a collaborator's cursor flickers. There's no ordering requirement, we just want the latest state per client. If awareness is expanded beyond the editor screen, as mentioned by Matt elsewhere (I remember reading it but can't find where he said it at the moment), the need to save awareness state will increase, and it will exacerbate any tradeoffs we make.

Combining them in one table means awareness writes (which are the highest-frequency operations involving every active client, every poll cycle) inflate the auto-increment ID space and add rows that the compaction/cleanup logic needs to work around. The indexing needs are also different: content updates need (room, id) for cursor-based polling, while awareness needs (room, client_id) for upsert-by-client lookups.

Separate storage would let each be optimized for its actual workload. Content updates could stay in the wp_collaboration table with auto-increment cursors and compaction, and that would keep it lean and fast with no need to scan past expired or current awareness rows. Awareness could use a second table with an INSERT ... ON DUPLICATE KEY UPDATE keyed on (room, client_id). Each client only ever writes its own row, so concurrent updates can't overwrite each other, and it doesn't add write volume to the content updates table.

Replying to czarate:

Both client_id and type are internal implementation details that are specific to Yjs and the polling provider. They could change in the future. Keeping the storage mechanism opaque to these implementation details greatly reduces the risk of breaking changes.

Regardless of the backing implementation (Yjs or otherwise), any collaborative editing system needs to identify its clients and distinguish between types of sync data. If the table is holding multiple types of data, we'll always need to know what type we have, and preferably query for only the data needed at any given time.

  • client_id — Any collaboration protocol needs to identify which participant originated an update. Without a client identifier, we can't filter a client's own updates out of poll responses, we can't do per-client awareness state, and we can't attribute changes. The concept of a client identity is inherent to multiuser editing.
  • type — The distinction between document updates and awareness/presence information exists in every collaboration system. These are conceptually different operations with different lifecycles. Whether we call it type, event_type, channel, or whatever, we need to distinguish them.

Optimizing tables for the use case we have, and know we will have soon, seems like a good win and doesn't saddle us with more schemas that are less than ideal. There seems to be several tradeoffs associated with trying to put both in a single table. Is there an actual mandate to limit to one table? Is there a technical/deployment type concern of having two new tables instead of one? Do we think there would be a higher rate of failure or some other kind of ecosystem fallout if we had two new tables?

#52 in reply to: ↑ 51 Image @czarate
4 weeks ago

Replying to mindctrl:

Content updates are append-only messages that must be reliably delivered to every client exactly once. ... Reliable delivery matters, as a missed or duplicated update means a diverged document.

Small point of clarity: At-least-once delivery is OK, if not ideal. Merging an already-applied update is effectively a no-op (assuming the update has not been modified in any way). A missed update, on the other hand, is destructive.

Otherwise I agree with your case for a second table.

Regardless of the backing implementation (Yjs or otherwise), any collaborative editing system needs to identify its clients and distinguish between types of sync data. If the table is holding multiple types of data, we'll always need to know what type we have, and preferably query for only the data needed at any given time.

While I don't think the overhead of filtering results from an "over-query" is that bad, I agree with this in general—as long as we represent both client_id and type flexibly in the schema. Both should probably be VARCHARs with reasonably future-proof sizes?

#53 Image @JoeFusco
4 weeks ago

Building on the discussion here and @mindctrl's analysis in comment 51 - sync updates and awareness are as different as posts and sessions. One is append-only content ordered by cursor, the other is ephemeral per-user state that gets overwritten every second. Storing both in one table with an event_type column means the UNIQUE KEY (room, client_id) that prevents simultaneous awareness writes from overwriting each other can't coexist with sync update rows, where a single client writes many rows per room.

This is what led us to two tables, each a general-purpose primitive:

wp_collaboration - append-only message passing

Column Type Notes
id BIGINT unsigned, auto_increment, Primary Key Cursor for polling
room VARCHAR(191) Unhashed room identifier ($max_index_length for utf8mb4 index compatibility)
type VARCHAR(32) update, sync_step1, sync_step2, compaction
client_id VARCHAR(32) Originating client
update_value LONGTEXT Opaque payload
created_at DATETIME For cron cleanup (7-day TTL)

Indexes: KEY room (room, id), KEY created_at (created_at)

wp_awareness - per-client presence

Column Type Notes
id BIGINT unsigned, auto_increment, Primary Key
room VARCHAR(191) Unhashed room identifier ($max_index_length for utf8mb4 index compatibility)
client_id VARCHAR(32) One row per client per room
wp_user_id BIGINT unsigned WordPress user ID
update_value LONGTEXT Awareness payload
created_at DATETIME Updated on every write, used for expiry

Indexes: UNIQUE KEY room_client (room, client_id), KEY room_created_at (room, created_at), KEY created_at (created_at)

The multisite upgrade path is gated - wp_is_collaboration_enabled() checks both the option and db_version, so the tables are never touched before the upgrade runs.

Cleanup is a single cron: collaboration rows older than 7 days, awareness rows older than 60 seconds.

Last edited 4 weeks ago by westonruter (previous) (diff)

Image

This ticket was mentioned in Slack in #core by joefusco. View the logs.


4 weeks ago

Image

This ticket was mentioned in Slack in #hosting by amykamala. View the logs.


4 weeks ago

#56 follow-up: Image @amykamala
4 weeks ago

Hosts have provided some brief feedback in this week's hosting meeting on RTC performance and defaults (as they stand in Beta 4) https://wordpress.slack.com/archives/C3D6T7F8Q/p1773252483953569

#57 follow-ups: Image @peterwilsoncc
4 weeks ago

@JoeFusco My reading of Matt's comment above is that the addition of one table has been authorized for the current and future RTC features. It's a very rare occurrence so we're not going to be able to slide in two when it's possible to use one. The last time the table structure changed was to introduce term meta and links were deprecated to prep for the eventual removal of wp_links.

No matter where awareness is stored, it's going to generate a lot of rows. If we use transients, they get added to the options table, if we use the RTC table, they get added there.

I think it better to put awareness in the RTC table but maybe we can lean on the transients pattern and use a persistent cache if it's available.

Re: Over queries, anything that allows us to limit the queries is an improvement over the current post meta implementation. My understanding is the query will be something along the lines of SELECT * FROM table WHERE cursor > /* cursor position on last request, 0.25 seconds ago */ AND event = sync. In most cases, that means that over query is one row and PHP is probably more efficient for filtering than adding a column and an index for said column.

#58 in reply to: ↑ 56 Image @peterwilsoncc
4 weeks ago

Replying to amykamala:

Hosts have provided some brief feedback in this week's hosting meeting on RTC performance and defaults (as they stand in Beta 4) https://wordpress.slack.com/archives/C3D6T7F8Q/p1773252483953569

#64845 has been opened do discuss defaults so we can keep this ticket focused on the tables/persistent cache issue.

#59 in reply to: ↑ 57 ; follow-ups: Image @JoeFusco
3 weeks ago

Replying to peterwilsoncc:

@JoeFusco My reading of Matt's comment above is that the addition of one table has been authorized for the current and future RTC features. It's a very rare occurrence so we're not going to be able to slide in two when it's possible to use one.

Agreed - one table for sync, no argument there.

No matter where awareness is stored, it's going to generate a lot of rows. If we use transients, they get added to the options table, if we use the RTC table, they get added there.

This is the key question. Awareness generates rows no matter where it lives - the difference is what guarantees that storage provides and what it costs the host table.

I think it better to put awareness in the RTC table but maybe we can lean on the transients pattern and use a persistent cache if it's available.

Persistent object cache as the primary store for awareness - fully agree. It's fast, atomic, and ephemeral by nature. No DB writes for awareness when cache is available.

The open question is the fallback when persistent object cache isn't available. Transients fall back to wp_options, and for awareness that means:

  • One write per client per room every poll interval (~1 second)
  • Short TTL (30-60 seconds), so rows expire and accumulate until lazy garbage collection runs
  • On multisite, these writes hit per-site wp_X_options tables - already the most bloated table on most installs

At 10 sub-sites with 2 editors each, that's 20 transient writes/sec across 10 wp_options tables. At 50 sub-sites, it's 100+. These fall directly out of the polling interval.

Putting awareness in the single collaboration table has a different cost. Sync needs multiple rows per (room, client_id), so a UNIQUE KEY on those columns can't exist in a shared table. Without it, concurrent awareness writes produce duplicate rows instead of being prevented - requiring application-level deduplication rather than preventing duplicates at the schema level.

Re: Over queries, anything that allows us to limit the queries is an improvement over the current post meta implementation. My understanding is the query will be something along the lines of SELECT * FROM table WHERE cursor > /* cursor position on last request, 0.25 seconds ago */ AND event = sync. In most cases, that means that over query is one row and PHP is probably more efficient for filtering than adding a column and an index for said column.

Agreed - PHP-side filtering makes sense given the row counts from a cursor query. No need for an index on event type.

So for no-cache sites, the options are:

Option Issue
Transients → wp_options High-frequency ephemeral writes to a table designed for low-frequency config data
Single collaboration table No UNIQUE KEY — concurrent awareness writes produce duplicates instead of being prevented
Require persistent object cache Excludes a significant portion of the install base
Small dedicated awareness table Correct by constraint, bounded, trivial cron cleanup

This isn't a preference for two tables — it's that every fallback path for awareness without cache makes something else worse. If there's a fifth option I'm not seeing, I'd welcome it.

Does the original one table guidance apply to the no-cache fallback path specifically? The scope is narrow: bounded rows (one per client per room), 60-second TTL, @access private, feature-gated.

#60 in reply to: ↑ 59 Image @westonruter
3 weeks ago

Replying to JoeFusco:

Require persistent object cache: Excludes a significant portion of the install base

Note that in #56040 for WP 6.1, a Persistent Object Cache test was added to Site Health. It appears in the production environment. It has certain thresholds for whether the test fails:

Threshold Value
alloptions_count 500
alloptions_bytes 100,000
comments_count 1,000
options_count 1,000
posts_count 1,000
terms_count 1,000
users_count 1,000

If a persistent object cache becomes a requirement for RTC, the thresholds could be eliminated in favor of it always being recommended if RTC is enabled and the polling transport is being used.

Nevertheless, shared hosting environments likely wouldn't provide Redis/Memcached to be able to enable a persistent object cache, so site owners would be stuck unless they upgrade. And for hosts that do provide Redis/Memcached, they may make available a WebSocket transport anyway.

#61 in reply to: ↑ 57 Image @czarate
3 weeks ago

There is another potential use case for this table that is distinct from the default HTTP polling provider: CRDT document persistence. Persisting the CRDT doc alongside the entity record (e.g., post) is essential for preventing the "initialization problem" where two peers have different initial state for their respective CRDT documents and therefore cannot apply updates consistently.

Currently, we only persist CRDT documents for post entities, but that will change in the future as we sync additional entity types. CRDT documents for post entities are stored in post meta, but other entity types may not have an equivalent storage target (e.g., taxonomies, post types, etc.) Being able to persist CRDT documents in this new table (even when another sync provider is used) is appealing. It would simply be a row with a distinct type (persisted_crdt_doc).

#62 Image @dd32
3 weeks ago

Require persistent object cache

I was asked if WordPress.org stats collects data about this, and as far as I know we do not. We do not collect mu-plugin data either as far as I'm aware.

However, we do collect data about PHP extension availability: redis ~42%, memcached ~28%, apcu ~20%.
I don't think it's relevant, as we'd use WPDB for tables, but: sqlite3 ~94%

#63 in reply to: ↑ 59 ; follow-up: Image @peterwilsoncc
3 weeks ago

Replying to JoeFusco:

Replying to peterwilsoncc:

@JoeFusco My reading of Matt's comment above is that the addition of one table has been authorized for the current and future RTC features. It's a very rare occurrence so we're not going to be able to slide in two when it's possible to use one.

Agreed - one table for sync, no argument there.

I really don't like dealing with disingenuous comments. My comment is perfectly clear that my understanding is there is approval to add one (1) table in total to WordPress 7.0, bringing the total number of tables in WordPress to thirteen (13) in a single site install. To pretend to interpret my comment as anything else is very unhelpful.

It took a week of delaying to even get this discussion started, let's not waste further time with deliberate misinterpretations.

For the purposes of awareness updates, I don't see that a unique index is absolutely needed. Helpful, sure but not something that warrents another table.

For most requests, the table will be updated so we can run something along the lines of

UPDATE `wp_collaboration` 
   SET event_time = NOW()
   WHERE room = 'room1' AND event = 'awareness' AND client_id = 'client2'

If no rows are affected, that implies that the client is signing on and an insert is required.

Yes, if two concurrent requests occur there is a slim chance two rows will be created but that should be an exceptionally rare circumstance: the polling client waits for the apiFetch promise to complete before scheduling the next polling event. You can see this in affect by throttling network requests.

As @czarate mentions above, there is a chance that other items will need to be considered as part of the real time collaboration feature. Rather than targeted, endpoint specific tables, we need to write a table that meets the requirements of what is known today and the unknown of tomorrow.


Re: requiring a persistent cache: I don't think this is a good idea, I just see it as a handy place to shift awareness if it's available. Similar to transients but without the options table.

#64 follow-up: Image @ellatrix
3 weeks ago

A dumb question: why can't we allow registering post meta keys that are not queryable (through WP_Query etc.) and then don't invalidate cache? For these keys it would only be possible to retrieve the values directly through get_post_meta. This doesn't have to be a one-off special-case hack but rather a built-in way to do it.

I'm bit concerned with adding a table this late in the release cycle. RC1 is Thursday, so this would have to be added before then. Correct me if I'm wrong, but we're also setting something in stone now that becomes hard to change or undo later.

#65 Image @zieladam
3 weeks ago

Hopefully we can leverage this for Playground sync in the future as well.

For Playground<->Playground, where we can log every database query, potentially. For WordPress<->Playground and WordPress<->WordPress we may need a dedicated, optimized data structure.

For synchronizing the files, we can get away with a simple text-based index file. The site import tool I'm building right now can synchronize file deltas based on a TSV file with this layout: base64 file path | last modified date | size. It doesn't version the database rows, though, It just overwrites the entire database on every sync.

@dmsnell and I explored synchronizing the databases. The solution is in knowing when something in the database changes, e.g. a table is created, a column is added, these specific rows are updated. In an unconstrained environment, we could keep track of it with a write-ahead log using a database extension or a clever set of triggers. When that's not an option, the next best thing is using WordPress hooks before every SQL query and, independently, running a slow-paced indexer via wp-cron. That indexer would need a data backend where the indexed rows can be found with a very low cost. The best solution we've found for that was using five different tables, each with a schema resembling indexed_table_name | indexed_primary_key | last_modified_date | hash | version_number.

@dmsnell explored an improved search feature that distinguished rendered text from HTML syntax. That search index might make a better fit for the synchronization work as it uses a similar schema and tries to answer a similar question: "did this row change after it was indexed?"

Last edited 3 weeks ago by zieladam (previous) (diff)

#66 in reply to: ↑ 64 Image @joehoyle
3 weeks ago

Replying to ellatrix:

A dumb question: why can't we allow registering post meta keys that are not queryable (through WP_Query etc.) and then don't invalidate cache? For these keys it would only be possible to retrieve the values directly through get_post_meta. This doesn't have to be a one-off special-case hack but rather a built-in way to do it.

I'm bit concerned with adding a table this late in the release cycle. RC1 is Thursday, so this would have to be added before then. Correct me if I'm wrong, but we're also setting something in stone now that becomes hard to change or undo later.

I've seen many cases where essentially a "non-autoloaded" post meta registration would be valuable, which could have different semantics for the last_changed date. Perhaps there is many good reasons to have custom tables for the collaborative data types, but if this is primarily about caches on post objects, it seems like a way to circumvent that problem directly could be good.

Image

This ticket was mentioned in PR #11256 on WordPress/wordpress-develop by @JoeFusco.


3 weeks ago
#67

The real-time collaboration sync layer currently stores messages as post meta, which works but creates side effects at scale. This moves it to a single dedicated wp_collaboration table purpose-built for the workload.

### Table Definition

CREATE TABLE wp_collaboration (
    id bigint(20) unsigned NOT NULL auto_increment,
    room varchar(191) NOT NULL default '',
    type varchar(32) NOT NULL default '',
    client_id varchar(32) NOT NULL default '',
    user_id bigint(20) unsigned NOT NULL default '0',
    update_value longtext NOT NULL,
    date_gmt datetime NOT NULL default '0000-00-00 00:00:00',
    PRIMARY KEY  (id),
    KEY room (room, id),
    KEY date_gmt (date_gmt)
);

### Testing

npm run env:cli -- core update-db

Testers who already had the local env running before checking out your branch will hit the upgrade.php redirect without it. A fresh npm run env:install would also work, but core update-db is non-destructive and faster.

# Unit tests
npm run test:php -- --filter WP_Test_REST_Collaboration_Server

# E2E tests — 3 concurrent users (presence, sync, undo/redo)
npm run test:e2e -- tests/e2e/specs/collaboration/

_Note for beta testers: If upgrading from a previous beta, the old wp_sync_updates table is no longer used and can be dropped: npm run env:cli -- db query "DROP TABLE IF EXISTS wp_sync_updates"_

Trac ticket: https://core.trac.wordpress.org/ticket/64696

## Use of AI Tools

Co-authored with Claude Code (Opus 4.6), used to synthesize discussion across related tickets and PRs into a single implementation. All code was reviewed and tested before submission.

#68 in reply to: ↑ 63 Image @JoeFusco
3 weeks ago

@peterwilsoncc You're right, your comment:57 was clear — I misread the scope and should have treated awareness as part of the same one-table decision. Apologies for the confusion.

Thank you for your input throughout this discussion.


Per Matt's request (https://wordpress.slack.com/archives/C18723MQ8/p1773557762033729) here is the table definition for review.

wp_collaboration

One table for sync updates and awareness. Replaces the post meta approach from prior Betas.

CREATE TABLE wp_collaboration (
   id bigint(20) unsigned NOT NULL auto_increment, -- also serves as the polling cursor
   room varchar(191) NOT NULL default '',          -- e.g. postType/post:123, LIKE-able
   type varchar(32) NOT NULL default '',           -- update, sync_step1, sync_step2, compaction, awareness
   client_id varchar(32) NOT NULL default '',      -- one per browser tab
   user_id bigint(20) unsigned NOT NULL default '0',
   update_value longtext NOT NULL,
   date_gmt datetime NOT NULL default '0000-00-00 00:00:00',
   PRIMARY KEY  (id),
   KEY room (room,id),                             -- covers polling: WHERE room = %s AND id > %d
   KEY date_gmt (date_gmt)                         -- covers cron cleanup: WHERE date_gmt < %s
);

No UNIQUE KEY - sync rows need multiple entries per room+client. Awareness deduplication uses UPDATE-then-INSERT at the application level (per comment:63). A future Awareness API could move awareness to its own table with a proper UNIQUE KEY, eliminating this application-level workaround.

Compaction periodically collapses sync rows (threshold: 50), so the table stays bounded during a session. A daily cron (wp_delete_old_collaboration_data) removes sync rows older than 7 days and awareness rows older than 60 seconds.

When a persistent object cache is available, awareness reads skip the DB entirely.

https://github.com/WordPress/wordpress-develop/pull/11256

#69 Image @czarate
3 weeks ago

@JoeFusco

  1. Don't we need indexes on type and client_id given the proposed awareness update query in comment:63?
  2. I don't have a specific use case in mind, but an index on user_id might be wise in order to allow queries like "show me all the rooms where user 123 is collaborating."
  3. Nitpick, but update_value might be better named as data or collaboration_data given potential future use cases.

Image

@JoeFusco commented on PR #11068:


3 weeks ago
#70

Please see https://github.com/WordPress/wordpress-develop/pull/11256/ for the single table implementation.

#71 Image @davidbaumwald
3 weeks ago

Couple of thoughts.

  1. Would medium|longblob work better for the blob content on the update_value column?
  2. I am wondering if a covering index for KEY room_type_id(room,type,id) would be tighter for polling since there might be queries that either include/exclude event types?

Also, +1 to abstracting the update_value column name to somethign more general purpose, like data.

#72 follow-up: Image @dmsnell
3 weeks ago

This is a very active ticket and I recognize that a lot is going on and there are several different important discussions mushed together, and that everyone else here has spent more time on the problems than I have, and so please take this with that awareness that I’m not going to be as knowledgeable as others.

As I read through this to try and get up to speed, I think one salient point was lost: the role of update frequency. When this conversation started it seemed to contrast the infrequent updates from the existing heartbeat with the frequent updates from collaboration.

  1. If these sync and awareness/presence updates occurred less frequently, on par with the heartbeat, would this be the problem it is?

And that paired with another question I had, which was something that seemed surprising to me.

  1. Why are there so many updates occurring when the post isn’t changing?

On this second point I must be misunderstanding because I can’t imagine why we would have four updates per second when the editor is sitting open and idle. We are grabbing changes from the post and I would expect that we cull any and all updates which present no changes.

And presence or awareness isn’t something I’ve seen people want a history for, so at a minimum it seems more suited for transients than database storage. Especially if clients are reporting in as frequently as it seems like they are, then a complete loss of this information should be restored instantly, for all practical definitions of “instant.”


Given the timeline and still-broad discussion of scope, design, and even table counts or names, can this be resolved inside a temporary structure now and gracefully updated to a table once we have more time to iron out the basics?


Answering here for completeness and because it was brought up, but I don’t see these comments being super related here, and I wouldn’t want them to delay or distract from the issues at hand.

explored synchronizing the databases…knowing when something in the database changes…

The work we have done concerning a wp_sync_state table are designed to add a kind of versioning to all WordPress objects, efficiently track updates across an entire site, and self-heal in the presence of changes which are outside of WordPress purview. It’s more concerned about knowing whether a resource has (probably or definitely) changed since a given moment in time rather than assessing what those changes are exactly.

It dovetails well into this work, though it’s not likely directly applicable to shared transient editor state. The tie-in we did work on, however, was building a single-writer worker in WordPress to handle all of the inbound changes. That would leave one PHP process per document being concurrently edited and could hold this awareness/presence information without tables or transients — just PHP variables. That work does depend on having another new table though, an inbound work queue.

an improved search feature that distinguished rendered text from HTML syntax. That search index might make a better fit for the synchronization work as it uses a similar schema and tries to answer a similar question: "did this row change after it was indexed?"

The generalized change-tracking system has a lot of potentially utility because every operation (search indexing, database upgrades, cache regeneration, WordPress syncing) is just another “sync client.” It breaks down for the transitional states which appear while edit sessions are in progress; performing their job once a post revision is actually created.

#73 Image @peterwilsoncc
3 weeks ago

Mega comment incoming...


@czarate just had a quick call to discuss the proposed structure above.

With the inevitable naming things discussion, I agree update_value would be best named data to cover potential future use cases (yep, I suggested the name initially, sorry). @davidbaumwald I'd like to keep it as LONGTEXT to match the structure used in the meta tables.

Otherwise, we think KEY type_client (type,client_id) would be handy for some of the required queries. type comes first as it can be queried alone, whereas client_id queries should always require a type.

@JoeFusco are you happy with the following?

CREATE TABLE wp_collaboration (
   id bigint(20) unsigned NOT NULL auto_increment, -- also serves as the polling cursor
   room varchar(191) NOT NULL default '',          -- e.g. postType/post:123, LIKE-able
   type varchar(32) NOT NULL default '',           -- update, sync_step1, sync_step2, compaction, awareness
   client_id varchar(32) NOT NULL default '',      -- one per browser tab
   user_id bigint(20) unsigned NOT NULL default '0',
   data longtext NOT NULL,
   date_gmt datetime NOT NULL default '0000-00-00 00:00:00',
   PRIMARY KEY  (id),
   KEY type_client_id (type,client_id),
   KEY room (room,id),                             -- covers polling: WHERE room = %s AND id > %d
   KEY date_gmt (date_gmt)                         -- covers cron cleanup: WHERE date_gmt < %s
);

@ellatrix @joehoyle: maybe it would help to have non auto-loaded meta data, but I think that would be a larger change and warrant another ticket. While I opened the ticket because of the effects on cache, I think the new table is a better solve for the issue at hand and allows us to avoid altering the Query and meta APIs.


@dmsnell I think it would be possible to slow the updates down without collaborators in the room. With collaborators in the room, I don't think we'd want to do so to a heartbeat frequency.

FWIW, #64845 is covering frequency of updates after the concern was raised by some hosting providers. Are you able to post some thoughts there?

Given the timeline and still-broad discussion of scope, design, and even table counts or names, can this be resolved inside a temporary structure now and gracefully updated to a table once we have more time to iron out the basics?

The table will very quickly grow in size and the way database updates work with dbDelta() prevent us from truncating the table before altering it, so I'd really like to get it as robust as possible for 7.0.

Last edited 3 weeks ago by peterwilsoncc (previous) (diff)

#74 Image @peterwilsoncc
3 weeks ago

  • Description modified (diff)

#75 in reply to: ↑ 72 Image @czarate
3 weeks ago

Replying to dmsnell:

Hey @dmsnell, carving out the first two questions:

If these sync and awareness/presence updates occurred less frequently, on par with the heartbeat, would this be the problem it is?

I haven't heard any proposed alternate defaults for the polling intervals. These intervals of course affect the perceived responsiveness of the feature, but the defaults can be easily changed. They will also be filterable soon.

Why are there so many updates occurring when the post isn’t changing?

We only send updates when the post entity has been changed. However, we still frequently poll for updates from the server because we don't know when other remote peers have updated the post in their editors.

If all editors are sitting idle, these are very light requests: no updates sent, none received, and no real work to do.

We also poll for awareness state, which helps us display the presence of other collaborators within the block editor. Even an idle block editor might be observed; the observing user can benefit from an understanding of "who is doing what" as they plan their next edit.

We use browser visibility APIs to suspend polling when the block editor is truly unused and unobserved.

#76 follow-up: Image @dmsnell
3 weeks ago

I haven't heard any proposed alternate defaults for the polling intervals.

@czarate was long-polling explored? my proposed alternative is no polling intervals. even without the optimization of holding a single-writer process, that should still eliminate the network activity even though it wouldn’t also eliminate the database activity.

However, we still frequently poll for updates from the server because we don't know when other remote peers have updated the post in their editors…We also poll for awareness state

Sorry for my misunderstanding. The discussion led me to believe that we are still sending pings to the server and updating post_meta for awareness even when the cursor hasn’t moved; literally when the human editor is sitting idle with the block editor open on the post.

With collaborators in the room, I don't think we'd want to do so to a heartbeat frequency…Are you able to post some thoughts there?

@peterwilsoncc I will take a look. My questions here though are not about how many milliseconds between polling but rather event-based polling and listening for updates. Even with an HTTP “polling” API we should be able to have two-way pseudo-realtime event streaming between the server and each client, eliminating the need for all this noise.

The tradeoff is that these approaches require a persistent PHP process on the server per open editor session (without any further optimizations). Maybe that was dismissed because of an expectation for sites with thousands of concurrent editing sessions…?

#77 in reply to: ↑ 76 Image @czarate
3 weeks ago

Replying to dmsnell:

@czarate was long-polling explored?

It was. Ultimately, we sought a true lowest-common denominator approach that avoided a continuous connection (a scarce commodity on some hosts). In addition, some proxies and hosts can interfere long-polling. More discussion here: https://github.com/WordPress/gutenberg/issues/74085

However, we still frequently poll for updates from the server because we don't know when other remote peers have updated the post in their editors…We also poll for awareness state

Sorry for my misunderstanding. The discussion led me to believe that we are still sending pings to the server and updating post_meta for awareness even when the cursor hasn’t moved; literally when the human editor is sitting idle with the block editor open on the post.

No misunderstanding. We do update awareness post_meta on every poll interval, even when there is no change in awareness state. We do so in order to establish the user's "last seen" date, which is eventually used to prune the user when they leave the session.

It's a good call-out. We can probably implement some throttling so that the post_meta is only updated every n seconds or when awareness state changes. Most Yjs providers update awareness state every second.

#78 Image @JoeFusco
3 weeks ago

@peterwilsoncc Thanks - that makes sense to me. Updated the PR with both changes (column rename and new index), and bumped $wp_db_version accordingly.

https://github.com/WordPress/wordpress-develop/pull/11256

Thank you to everyone who has contributed to the discussion around naming and the broader architecture questions.

Image

@JoeFusco commented on PR #11256:


3 weeks ago
#79

Carrying over props from original PR:

Props joefusco, peterwilsoncc, mindctrl, westonruter, paulkevan, dd32, czarate.

#80 follow-up: Image @ellatrix
3 weeks ago

@ellatrix @joehoyle: maybe it would help to have non auto-loaded meta data, but I think that would be a larger change and warrant another ticket. While I opened the ticket because of the effects on cache, I think the new table is a better solve for the issue at hand and allows us to avoid altering the Query and meta APIs.

How would it be a larger change than adding a whole new table? 🤔 I quickly tried this locally and I have 113 insertions(+), 14 deletions(-). But I may be missing something. I think the only thing we need there is a new register_post_meta setting, and some early returns in wp_cache_set_posts_last_changed and WP_Meta_Query. What am I missing?

Also, I'm not sure I understand why would it require a new ticket? It seems directly applicable here. Why is adding a few lines to Query and meta APIs bad?

Last edited 3 weeks ago by ellatrix (previous) (diff)

#81 Image @zieladam
3 weeks ago

@ellatrix would you be able to share that patch? What would it take to compare the performance of this post_meta approach to the performance of a dedicated table on a medium-to-large site?

Last edited 3 weeks ago by zieladam (previous) (diff)

#82 Image @zieladam
3 weeks ago

@ellatrix here's a few more potential problems with post meta:

  • Large sites can have millions of post meta entries, that would impact every read and write and RTC may write multiple times every second. We'd need to benchmark this to check if that's a problem in practice in a live site.
  • Plugins may hook into update_post_meta and similar, and run computations that would slow down RTC.
  • For MyISAM tables, each update would lock the entire post_meta table. For InnoDB, I think we're fine locking-wise.
Last edited 3 weeks ago by zieladam (previous) (diff)

#83 Image @ellatrix
3 weeks ago

@zieladam I just created something quick with Claude/AI so it's only theoretical, I haven't tested. That's probably stuff I'm missing, I'm not very familiar with this side of core. It's pretty much what I shared before in pseudo code. I saw people were discouraging PRs and preferred to discuss here, but since you're asking I'll just link to a commit: https://github.com/ellatrix/wordpress-develop/commit/86bfd1ed2185f2ec867b8fb24c4627083915e595

#84 Image @griffbrad
3 weeks ago

I'd personally feel more comfortable with the custom table. While Ella's approach avoids the cache invalidation issue itself, that issue pointed to a real mismatch in the needs of collaboration and the post/postmeta tables. Posts are rarely written to and read very frequently. Collaboration state, on the other hand, has a lot of churn with frequent writes and deletes. When I see that kind of mismatch, it's generally a good indication that a separate table optimized for the differing expectations would be worthwhile.

#85 Image @czarate
3 weeks ago

Compaction race condition — The "delete all, re-add some" pattern in remove_updates_before_cursor() loses messages under concurrent writes. The window between delete_post_meta() and the add_post_meta() loop is unprotected.

I don't want to lose sight of this critical bug that is de-facto fixed with the custom table approach.

It is also addressed by this PR that continues to use post meta:

https://github.com/WordPress/wordpress-develop/pull/11067

This bug has been held in limbo for a few weeks now by this discussion, but we need to get it fixed regardless of the direction we go.

#86 in reply to: ↑ 80 Image @peterwilsoncc
3 weeks ago

Replying to ellatrix:

How would it be a larger change than adding a whole new table? 🤔 I quickly tried this locally and I have 113 insertions(+), 14 deletions(-). But I may be missing something. I think the only thing we need there is a new register_post_meta setting, and some early returns in wp_cache_set_posts_last_changed and WP_Meta_Query. What am I missing?

It would require modifications and extensions to multiple components:

  • Meta API would need to be extended to allow the registration of non-queryable, uncached post meta
  • Meta API functions would need to be updated to respect new setting, for getting, setting and updating meta
  • Potential cache API changes along with the above
  • Query API would need to trigger errors for developers attempting to query via the forbidden meta keys for post, term, user and comment query classes
  • WP_Meta_Query would need to be updated to exclude the forbidden meta keys in any meta_queries (not sure how this would work, probably altering meta_query to nest passed parameter.
  • There's probably other things to consider too

The new table is also useful if the awareness API is expanded to other screens within the dashboard, eg showing how many users are on a particular screen.

Image

@peterwilsoncc commented on PR #11256:


3 weeks ago
#87

@josephfusco Sorry, I had a dumb moment -- we will need the back compat shim until the gutenberg merge is done so the version included in WordPress-Develop will work until the GB reference is updated.

#88 Image @ellatrix
3 weeks ago

@peterwilsoncc Did you check the patch? https://github.com/WordPress/wordpress-develop/compare/trunk...ellatrix:wordpress-develop:fix/meta-query-cache-invalidation

My patch is merely a suggestion and I'm not going to pretend I know enough about these APIs. I'm also not sure how to properly test this. I believe your points above are present in the patch but there may be things I've missed.

Last edited 3 weeks ago by ellatrix (previous) (diff)

#89 Image @jorbin
3 weeks ago

Something that should be considered is that table creation isn't a guarantee. We don't know if the mysql user that WordPress is using has CREATE for existing sites. Additionally on multisite, it's very possible the DB upgrade will not have run before people are running the rest of the code. For instance, I think it took WPVIP a rather long time to complete term splitting and add the term meta table.

#90 follow-up: Image @griffbrad
3 weeks ago

With RC1 so close, we've had some more discussions about the custom DB table, and it just seems risky to introduce a schema change so late in the process. We do think there are likely benefits to having it in place ultimately, but the likelihood of getting the schema right without sufficient testing in production is low. Assuming Ella's patch is a viable approach to avoid the cache invalidation issue that inspired this issue originally, we'd like to take that route as it seems lower risk. Overall, our primary goal with RTC in 7.0 is to gather as much information as possible from hosts and plugin authors about how RTC affects the broader WP ecosystem. Once we have that information, we can more confidently add the custom table(s) that will make the feature the best it can be.

#91 Image @joehoyle
3 weeks ago

Perhaps the simplest solution here would be to add a flag to register_meta for causes_last_changed_update which determines if we bump the wp_cache_get_last_changed value on updating the meta.

As @ellatrix mentioned, a WP_Query should then not allow querying by meta_value for those types of registered meta to prevent accidental cache use.

In the future I think it would be good to be able to register a meta-key as non-autoloded too so prime_post_caches doesn't pull those meta keys from the DB and store them in the single object-cache entry for all meta. I'm not sure that's needed immediately here, but I would imagine it could become a problem: currently storing anything more than ~ 1+ MB in a post's meta can cause significant issues with performance with persistent object caches.

#92 Image @joehoyle
3 weeks ago

Having reviewed @ellatrix's patch in https://github.com/WordPress/wordpress-develop/compare/trunk...ellatrix:wordpress-develop:fix/meta-query-cache-invalidation I think that's essentially what I'm suggesting (but with better naming!). I think that paves the way for prime_meta_cache not to include those keys to address the secondary issue of sync data "clogging up" the in-memory caches for "read only" front end performance etc.

#93 Image @ellatrix
3 weeks ago

I updated the patch a bit: added unit tests and made sure it checks the post type. For WP_Query checks, it only refuses when there is a post type available to check. With type set to any or no WP_Query context, it allows the query since we can't be sure. I guess it's possible to have multiple meta key registrations of the same key for different post types with different settings, so I'm trying to take that into account. It's not a problem at all for the cache invalidation itself since we always know the post type, this is just about the WP_Query enforcement.

Image

This ticket was mentioned in PR #11290 on WordPress/wordpress-develop by @ellatrix.


3 weeks ago
#94

Alternative approach to the cache invalidation problem: instead of a new table, allow post meta keys to be registered as non-query-cacheable via register_meta() / register_post_meta().

### What it does

Adds an invalidates_query_cache parameter to register_meta() (post meta only). Keys registered with false:

  • Do not bump the last_changed timestamp in the posts cache group when added, updated, or deleted
  • Are refused in WP_Meta_Query clauses (with _doing_it_wrong) when the post type is known, to prevent stale cached results
  • Can still be read/written normally via get_post_meta() / update_post_meta() / etc.

### Design

Cache correctness relies on two complementary mechanisms:

  1. Cache invalidation (wp_cache_set_posts_last_changed): the authoritative check. Meta write hooks always provide the object ID, so the exact post type is always known. Non-cacheable meta never incorrectly bumps last_changed.
  1. Meta query blocking (WP_Meta_Query): a conservative safety net. Only refuses non-cacheable keys when the post type is known from the parent WP_Query context. Without context, allows the query through — better to allow a query than incorrectly block a legitimate one.

Supports per-post-type registration: a key can be non-cacheable for one post type but cacheable for another.

### Testing

15 unit tests covering registration, cache invalidation (add/update/delete), meta query blocking (with/without context, per-post-type, multi-post-type), and the _doing_it_wrong notice.

phpunit --filter Tests_Meta_InvalidatesQueryCache

## Use of AI Tools

Co-authored with Claude Code (Opus 4.6).

#95 Image @JoeFusco
3 weeks ago

With the decision to ship RTC as opt-in for RC1 (Slack announcement), I'd like to share some context on the remaining open question: whether to proceed with the dedicated table.

@ellatrix, thank you for putting together a patch to explore the post meta direction. Before adding a table, we should be sure post meta can't do the job. The cache invalidation fix addresses 1 of the 3 known issues that surfaced during development. The other two sit deeper in how post meta handles this access pattern: the compaction race condition that @czarate documented in comment:85 and PR #11067 can silently drop updates when concurrent writes overlap, and cursor ID collisions can occur when those writes land within the same millisecond. @peterwilsoncc walked through the scope of what the Meta API and WP_Meta_Query changes would require in comment:86, each touching shared infrastructure that every plugin in the ecosystem relies on.

The table resolves all three issues in one place. Append-only writes with auto-increment IDs eliminate both the compaction race and the collision window structurally. The blast radius is limited to RTC - nothing else in core is affected. And the compaction bug that @czarate identified needs to be fixed regardless of direction. With the table, it already is.

The data stored in this table is entirely ephemeral - session state, not user content. If the schema needs to evolve in 7.1, we can alter or rebuild the table without any meaningful data loss. wp_is_collaboration_enabled() requires both the site option and db_version >= 61841, so if the database upgrade hasn't run yet, on multisite or anywhere else, RTC simply doesn't activate. Sites that don't opt in never create or query the table.


The implementation is tested (unit + E2E), feature-gated, and isolated from core's shared APIs. Could use feedback on the supporting tests to ensure we captured all relevant scenarios.

https://github.com/WordPress/wordpress-develop/pull/11256

#96 Image @peterwilsoncc
3 weeks ago

@ellatrix I had not (I was AFK for most of yesterday due to family commitments) but was simply focusing on why I thought it was a riskier change to alter the meta API than to add the new table. Both have risks attached, so it's a relative analysis.

I'll drop a test on the PR with an example of the risk but my view remains that for the issue at hand the least risky option is to go with the new table.

As RTC creates a lot of data, the custom table presents an additional advantage. As data increases a developer can simply(tm) truncate the new table if they wish to do a reset.

I'm going to focus on @JoeFusco's PR for now as I think it offers the best chance of getting RTC in to core and Matt has already approved the new table to treat RTC as a first class object.

Image

@peterwilsoncc commented on PR #11290:


3 weeks ago
#97

Dropping this here rather than on the ticket as it's a implimentation consideration rather than a architectural consideration. One of the risks we'd need to consider is demonstrated by these two tests

public function test_wp_query_side_effects_one() {
        register_post_meta(
                'post',
                'uncached_meta_key',
                array( 'invalidates_query_cache' => false )
        );

        add_post_meta( self::$post_id, 'uncached_meta_key', 'uncached_meta_value' );
        $query1 = new WP_Query(
                array(
                        'fields'     => 'ids',
                        'meta_value' => 'uncached_meta_value',
                )
        );
        $this->assertEmpty( $query1->posts, 'WP_Query should not query non-cacheable meta data.' );
}

public function test_wp_query_side_effects_two() {
        register_post_meta(
                'post',
                'uncached_meta_key',
                array( 'invalidates_query_cache' => false )
        );

        add_post_meta( self::$post_id, 'uncached_meta_key', 'uncached_meta_value' );
        $query1 = new WP_Query(
                array(
                        'fields'     => 'ids',
                        'meta_value' => 'uncached_meta_value',
                )
        );

        delete_post_meta( self::$post_id, 'uncached_meta_key' );
        $query2 = new WP_Query(
                array(
                        'fields'     => 'ids',
                        'meta_value' => 'uncached_meta_value',
                )
        );
        $this->assertEmpty( $query2->posts, 'WP_Query should not cache non-cacheable meta queries.' );
}

Results:

$ phpunit --filter test_wp_query_side_effects
Installing...
Running as single site... To run multisite, use -c tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-http.
PHPUnit 9.6.34 by Sebastian Bergmann and contributors.

Warning:       Your XML configuration validates against a deprecated schema.
Suggestion:    Migrate your XML configuration using "--migrate-configuration"!

FF                                                                  2 / 2 (100%)

Time: 00:00.206, Memory: 235.92 MB

There were 2 failures:

1) Tests_Meta_InvalidatesQueryCache::test_wp_query_side_effects_one
WP_Query should not query non-cacheable meta data.
Failed asserting that an array is empty.

/vagrant/wordpress-develop/tests/phpunit/tests/meta/invalidatesQueryCache.php:391

2) Tests_Meta_InvalidatesQueryCache::test_wp_query_side_effects_two
WP_Query should not cache non-cacheable meta queries.
Failed asserting that an array is empty.

/vagrant/wordpress-develop/tests/phpunit/tests/meta/invalidatesQueryCache.php:417

Image

@ellatrix commented on PR #11290:


2 weeks ago
#98

@peterwilsoncc Right, there's a decision to be made there for those queries that only provide a meta value to query without a key. Either we alter the query (commit), which tbh I find a little too aggressive and risky, or we are ok with this edge case. Either way, it's probably rare to have queries by meta value only?

Image

This ticket was mentioned in Slack in #core by desrosj. View the logs.


2 weeks ago

Image

@czarate commented on PR #11290:


2 weeks ago
#100

I just tested this and compared against trunk. I can confirm that this PR prevents query cache invalidation for the two RTC post meta.

### Before

https://github.com/user-attachments/assets/c10cae11-adf2-4aba-8584-e8db61c52ef8

### After

https://github.com/user-attachments/assets/e3b378ed-2653-4bcc-8c99-0065df232799

#101 follow-up: Image @JoeFusco
2 weeks ago

Two open architecture questions from PR #11256 that would benefit from input here.

Index coverage for the polling query

@czarate noted on the PR that the polling query filters on type != 'awareness':

SELECT data FROM wp_collaboration WHERE room = %s AND type != 'awareness' AND id > %d AND id <= %d ORDER BY id ASC

The current KEY room (room, id) doesn't cover the type filter. A composite KEY (room, type, id) would let MySQL satisfy the query entirely from the index. @pkevan agreed this seems sensible given the queries. @peterwilsoncc, thoughts on whether this is worth adding now or deferring?

Awareness write frequency

@peterwilsoncc suggested bucketing awareness timestamps to 5-second intervals so that most awareness polls skip the DB write entirely. Combined with the existing short-circuit (skip UPDATE when date_gmt hasn't changed), this would reduce awareness write volume by roughly 80%. He also suggested a filter for allowing hosts to set awareness granularity. Input from @czarate on how this affects the client experience would be helpful.

#102 in reply to: ↑ 101 ; follow-up: Image @davidbaumwald
2 weeks ago

Replying to JoeFusco:

Two open architecture questions from PR #11256 that would benefit from input here.

Index coverage for the polling query

@czarate noted on the PR that the polling query filters on type != 'awareness':

SELECT data FROM wp_collaboration WHERE room = %s AND type != 'awareness' AND id > %d AND id <= %d ORDER BY id ASC

The current KEY room (room, id) doesn't cover the type filter. A composite KEY (room, type, id) would let MySQL satisfy the query entirely from the index. @pkevan agreed this seems sensible given the queries. @peterwilsoncc, thoughts on whether this is worth adding now or deferring?

Yes, this is why I asked about a covering index above.

I am wondering if a covering index for KEY room_type_id(room,type,id) would be tighter for polling since there might be queries that either include/exclude event types?

#103 in reply to: ↑ 102 Image @JoeFusco
2 weeks ago

Replying to davidbaumwald:

Yes, this is why I asked about a covering index above.

I am wondering if a covering index for KEY room_type_id(room,type,id) would be tighter for polling since there might be queries that either include/exclude event types?

Thank you, I apologize for not including that above. Ready to add the index once @peterwilsoncc confirms.

#104 follow-up: Image @davidbaumwald
2 weeks ago

Just thinking more about using post meta here, and IMHO it seems like the wrong path. Happy to be corrected if any of the below is wrong.

  1. Architecturally, the post meta table is a key–value metadata store, not a system for managing rapidly mutating application state.
  2. Compaction further increases the churn.
  3. Due to the nature of what's being stored, post meta cannot be adequately indexed to improve the throughput of a write-heavy system.
  4. Customizations in plugins that hook into add(?:ed)?_post_meta hooks will be run for every RTC sync.

For a long time now, "meta bloat" has been a concern in the ecosystem, and this would not improve the situation or perception. Even my personal experience crosses paths daily with millions of wp_postmeta rows.

WooCommerce moved to a denormalized, custom table storage for orders(HPOS) as a better fit.

The custom table solves many problems, and seems to me to be the right tool for the job here:

  • Isolates RTC workload from core post/query APIs.
  • Enables proper indexing and benefits from auto-increment.
  • Allows safe compaction and predictable performance characteristics.
  • Abstracted to be useful for entities/screens other than posts.

#105 in reply to: ↑ 104 Image @sc0ttkclark
2 weeks ago

Replying to davidbaumwald:

Just thinking more about using post meta here, and IMHO it seems like the wrong path. Happy to be corrected if any of the below is wrong.

Totally agree! I thought the custom table was a sealed deal already but now I see it may be flipping back to postmeta which I think would be a mistake. Adding my +1 on wanting the custom table.

#106 Image @maxschmeling
2 weeks ago

The database change isn't going to make it in for 7.0 per discussion with @matveb, @annezazu, @ellatrix, @griffbrad, and others. All of the hard work on it to this point is very much appreciated, and we definitely are going to want to continue to validate it for the 7.1 cycle. But the urgency is reduced now that we're moving forward with the fix @ellatrix implemented to prevent the cache invalidation.

I think we're all in agreement that it's the better overall long-term approach and will likely be a big part of the continued improvement of RTC.

#107 Image @peterwilsoncc
2 weeks ago

@maxschmeling I would have liked to have been involved in that discussion (along with @czarate and @JoeFusco).

As I mentioned earlier, the alternative approach is making some big changes to two fundamental WordPress APIs with very little architectural discussion: WP_Query (the most fundamental WordPress API) and the meta data APIs.

I was able to demonstrate an edge case in some tests yesterday that will bypass the prevention queries that are intended to be prevented.

A search of the plugin repo for meta_value queries show that they are being used in pretty much all of the plugins in the highest levels of usage shown in the plugin repo.

More significantly, I agree with @davidbaumwald's comment that post meta is the incorrect location to store the ephemeral data. With the existing WP Query/meta APIs it's very difficult to clear the post and post meta tables of expired data (it basically becomes irrelevant after a few hours).

With the changes proposed to the WP Query/meta APIs it will become impossible to use these APIs to remove the ephemeral data as doing so requires querying by the very meta data that is being removed blocked from querying.

we definitely are going to want to continue to validate it for the 7.1 cycle

This concerns me most of all. Rushing changes to WP Query and the meta APIs for a situation that we're to reevaluate in a few short months is problematic. Given the choice, I'd rather allow the current issue on this ticket to remain than to make any changes.

cc @matt, @matveb, @annezazu, @ellatrix, @griffbrad

#108 Image @davidbaumwald
2 weeks ago

+1 to @peterwilsoncc's comment, mainly agreeing with the last point.

It seems the schedule of the release is dictating shipping a worse implementation(as acknowledged above), rather than feature readiness itself being evaluated *if* it can meet the release schedule. It seems this feature has tested the resolve of multiple WordPress Core Philosophies during 7.0, which might be hinting at questions we should be asking.

My specific concern to add to Peter's is that, even in a vacuum and completely removed from discussions about RTC, the WP_Query and meta API changes in the workaround would be no-brainer early candidate even if they were completely separate, just based on ecosystem impact alone. I'd be hesitant to ship that change right at the Beta 1 deadline, much less RC1.

Last edited 2 weeks ago by davidbaumwald (previous) (diff)

#109 Image @JoeFusco
2 weeks ago

I'd like to understand the reasoning better. I may be missing context from the discussion. The dedicated table is feature-gated, isolated from shared APIs, and stores only ephemeral data that can be truncated at any time. The alternative requires changes to WP_Meta_Query and WP_Query cache invalidation - APIs the entire plugin ecosystem depends on and for a storage approach we expect to revisit in 7.1. I want to make sure I'm not missing something that makes the latter the lower-risk path.

#110 Image @griffbrad
2 weeks ago

Essentially, we did a review with Systems and talked over both options. This late in the game, they felt that the likelihood of getting the schema and performance characteristics of a new table right was more risky. You're totally right that WP_Query is a central API, but the patch was viewed as easy to reason about and easier to adjust if needed. Systems folks also felt the API in the patch was valuable outside the RTC context for other cases where post meta is written to in a way that too frequently causes cache invalidation, which they've seen from a variety of plugins in the ecosystem. Like I mentioned in this thread, I can definitely see the value in the custom table. I'd love to land it in 7.1. Both with the fixes to the cache invalidation issue and race condition from Ella and Chris, the post meta approach currently in use seems sufficient for the 7.0 release.

#111 follow-up: Image @peterwilsoncc
2 weeks ago

@griffbrad I fundamentally disagree that the WP_Query patch is easy to reason. Along with many other's, I've spent four or five years working on caching within the API. It's complicated and has a large number of edge cases.

I first asked for an architectural discussion on the new table two weeks ago, a solution was proposed a week ago and there hasn't been much in the way of feedback. Instead, we're going with a change that has had little-to-no architectural discussion as was submitted two days ago.

There will be edge cases that haven't been considered.

A fundamental question that needs to be considered is this: the current patch simply drops meta queries that are considered invalid, ie:

WP_Query( [meta_key => key_set_to_bypass_cache] ); will return the results of WP_Query( [post_type => post]) -- this seems like something that should be discussed. Should posts be returned in the case, should the WHERE clause be modified to WHERE 1 = 0 or something else entirely.

In 64696-late-meta-flush.diff I have made what I think is an MVP for avoiding cache invalidation caused by post meta updates:

  • Post meta updates do not invalidate the case
  • When post meta is updated, a cache key is added to record that the WP_Query meta queries are out of date
  • The next time a post meta query is made, the cache gets invalidated
  • Naming things is hard issues remain in the patch

With the patch applied, the reproduction steps in the original description no longer occur:

  1. Enable a persistent cache
  2. Enable RTC
  3. Open a post in the block editor and keep it open for all the remaining steps
  4. Run wp eval "var_dump( wp_cache_get_last_changed( 'posts' ) );" repeatedly, a few seconds apart
  5. Observe the value no longer changes on every run
  6. Run wp post list --meta_key=hi
  7. Re-run wp eval "var_dump( wp_cache_get_last_changed( 'posts' ) );"
  8. Observe that the value has changed
  9. Run wp post list
  10. Re-run wp eval "var_dump( wp_cache_get_last_changed( 'posts' ) );"
  11. Observe the value has not changed

I think there are two competing factors:

  • there is broad agreement that RTC both warrants and needs a custom table
  • there are concerns that the custom table needs further discussion

If that is the case then the safest solution as the release candidate approaches is to drop the feature and continue development in the Gutenberg plugin, extending the plugin to include the custom table.

Rushing through major changes to WP_Query is a far riskier solution to any of the alternatives based on my experience over the last few years.

#112 Image @ellatrix
2 weeks ago

https://core.trac.wordpress.org/ticket/64696#comment:107

@peterwilsoncc I'm confused by your comment. You know that the edge case you brought up is about querying by meta value only, which we concluded we can't easily search for, yet you link here to search results about any mention of meta_value including those with a meta_key (which is the overwhelming majority of them I think). If a query uses a value only, it's querying all meta regardless of the key, so these do seem to be rate and not "used by pretty much all plugins"?

Instead, we're going with a change that has had little-to-no architectural discussion as was submitted two days ago.

The feature has been using meta until now, so there must have been plenty of discussion around meta. The patch is about cache invalidation only.

Last edited 2 weeks ago by ellatrix (previous) (diff)

#113 in reply to: ↑ 111 Image @sippis
2 weeks ago

Dropping a .2$ from the perspective of developer who works with big enterprise clients. I'll be quite frank with the comment, sorry for that.

Let me remind you, the current post meta workaround was brought to the table three days ago accompanied by this introductory comment:

Replying to ellatrix:

I just created something quick with Claude/AI so it's only theoretical, I haven't tested. That's probably stuff I'm missing, I'm not very familiar with this side of core.

Replying to peterwilsoncc:

I think there are two competing factors:

  • there is broad agreement that RTC both warrants and needs a custom table
  • there are concerns that the custom table needs further discussion

If that is the case then the safest solution as the release candidate approaches is to drop the feature and continue development in the Gutenberg plugin, extending the plugin to include the custom table.

+1 this

I'd very much prefer a thoroughly vetted and well-discussed feature over a rushed, half-baked vibe coded workaround that everyone already know is temporary.

It's worth to keep in mind that if non-query-cacheable post meta keys are introduced, people will start using that feature out in the wild. There's no taking it back or the possibility of removing it later on, even when better approach for RTC is introduced. Such a major new feature should've been discussed and tested way earlier in the process, on a separate ticket, rather than introducing it as a byproduct of a rushed fix.

RC1 simply isn't the right point in release work to be debating around such a high-steaks decision.

I'm sure ya'll are aware of the WordPress philosophy, but let's remind ourselves about it:

The more frequent and regular releases are, the less important it is for any particular feature to be in this release. If it doesn’t make it for this one, it’ll just be a few months before the next one.

Last edited 2 weeks ago by sippis (previous) (diff)

#114 in reply to: ↑ 90 Image @sippis
2 weeks ago

Replying to griffbrad:

With RC1 so close, we've had some more discussions about the custom DB table, and it just seems risky to introduce a schema change so late in the process. We do think there are likely benefits to having it in place ultimately, but the likelihood of getting the schema right without sufficient testing in production is low. Assuming Ella's patch is a viable approach to avoid the cache invalidation issue that inspired this issue originally, we'd like to take that route as it seems lower risk. Overall, our primary goal with RTC in 7.0 is to gather as much information as possible from hosts and plugin authors about how RTC affects the broader WP ecosystem. Once we have that information, we can more confidently add the custom table(s) that will make the feature the best it can be.

Replying to griffbrad:

Essentially, we did a review with Systems and talked over both options. This late in the game, they felt that the likelihood of getting the schema and performance characteristics of a new table right was more risky. You're totally right that WP_Query is a central API, but the patch was viewed as easy to reason about and easier to adjust if needed. Systems folks also felt the API in the patch was valuable outside the RTC context for other cases where post meta is written to in a way that too frequently causes cache invalidation, which they've seen from a variety of plugins in the ecosystem. Like I mentioned in this thread, I can definitely see the value in the custom table. I'd love to land it in 7.1. Both with the fixes to the cache invalidation issue and race condition from Ella and Chris, the post meta approach currently in use seems sufficient for the 7.0 release.

Please define to who you refer when speaking about "we", "folks" and "they".

It's not helpful to have this kind of an comment nor possible to give any value for it, when "we", "folks" and "they" can be basically anyone from the streets.

Image

This ticket was mentioned in PR #11320 on WordPress/wordpress-develop by @czarate.


2 weeks ago
#115

This is a rebased version of @westonruter's https://github.com/WordPress/wordpress-develop/pull/11002 with conflicts fixed.


Trac ticket: https://core.trac.wordpress.org/ticket/64696

### Use of AI Tools
Gemini CLI authored a commit to help add phpdoc for the with_suspended_posts_last_changed_update() method.

Image

@zieladam commented on PR #11320:


2 weeks ago
#116

Also, this change touches a pretty central data flow – can we add a few tests for this to be sure it doesn't regress over time?

#117 follow-up: Image @JoeFusco
2 weeks ago

I could be wrong, but add_post_meta() unconditionally calls wp_cache_delete( $object_id, 'post_meta' ) inside add_metadata() (meta.php:145). update_post_meta() does the same.

Since sync updates and awareness state share the same storage post per room, every sync write and every awareness write (1-4x/sec per editor) invalidates the entire meta cache for that post - including all other cached meta keys. The next get_post_meta() call for awareness misses cache and hits the database, even when awareness data hasn't changed. I don't believe this has been addressed yet.

After r62064, 2 of 5 operations already bypass the Meta API with raw SQL. Fixing this would require bypassing add_post_meta() and update_post_meta() as well - bringing it to 4 of 5. At that point the remaining get_post_meta() would be reading from a cache that nothing is priming, so it would also need to be replaced. That's 5 of 5 operations on raw SQL against wp_postmeta.

Image

@czarate commented on PR #11320:


2 weeks ago
#118

Also, this change touches a pretty central data flow – can we add a few tests for this to be sure it doesn't regress over time?

Unit tests added in https://github.com/WordPress/wordpress-develop/pull/11320/commits/dc85bf74a91b2023572fc422ee7ac81060b82106

#119 Image @JoeFusco
2 weeks ago

Opened #64916 for the per-object meta cache invalidation as a separate tracked issue.

Image

This ticket was mentioned in PR #11325 on WordPress/wordpress-develop by @czarate.


2 weeks ago
#120

Opening this for conversation: What if we used direct database operations instead of post meta functions? Does that alleviate concerns about changing the APIs? What are the risks, aside from the security of the operations themselves?

Image

@zieladam commented on PR #11325:


2 weeks ago
#121

@chriszarate almost there! I've left one more comment to resolve a race condition issue. The rest of the change looks good to me!

#122 Image @czarate
13 days ago

I'd appreciate some additional eyes on https://github.com/WordPress/wordpress-develop/pull/11325. I believe it solves for the cache invalidation issues in this ticket and #64916 without introducing any API changes.

While I think a custom table approach is a better long-term solution, I understand the hesitancy from hosts and others. With an off-by-default stance in 7.0, I think we can provide a "feature preview" storage mechanism powered by post meta and work towards an improved approach in the future.

#123 Image @zieladam
12 days ago

In 62099:

Real-time collaboration: Use prepared queries instead of *_post_meta functions.

Replaces add_post_meta/update_post_meta with wpdb->insert/wpdb->update.

This prevents a real-time editing session from invalidating WP_Query and various other post caches every few seconds. RTC stores awareness and sync information in post meta with high frequency. However, every call the *_post_meta functions invalidated post caches.

This commit avoids this frequent invalidation by removing the direct *_post_meta calls in favor of $wpdb calls.

Props czarate, mukesh27, paulkevan.

Developed in https://github.com/WordPress/wordpress-develop/pull/11325.
See #64696.

Image

@zieladam commented on PR #11325:


12 days ago
#124

I see the feedback is all addressed, I went ahead and merged this in https://core.trac.wordpress.org/changeset/62099. Thank you @chriszarate!

#125 Image @zieladam
12 days ago

I've committed PR 11325 (use prepared queries instead of `*_post_meta` in revision 62099 to unblock the RC1 today. I still think transients would be a very worthwhile direction to explore for awareness data as they are often stored in object cache without ever touching the database @czarate.

Image

This ticket was mentioned in Slack in #core by mukeshpanchal27. View the logs.


12 days ago

Image

This ticket was mentioned in PR #11348 on WordPress/wordpress-develop by @peterwilsoncc.


12 days ago
#128

Trac ticket:

## Use of AI Tools

#129 in reply to: ↑ 117 Image @peterwilsoncc
12 days ago

Replying to JoeFusco:

I could be wrong, but add_post_meta() unconditionally calls wp_cache_delete( $object_id, 'post_meta' ) inside add_metadata() (meta.php:145). update_post_meta() does the same.

update_post_meta() will bypass flushing the cache if no rows are updated in the resulting update query, IE if update_post_meta( 1, 'key', 'value' ) is called and the value is already value then no flushing occurs. If the value was another value then the cache is flushed.

add_post_meta() is unconditional as the row is always inserted.

To be honest, I am not overly concerned about the single object's cache being flushed as the database query is SELECT /* data */ FROM wp_postmeta WHERE post_id = # and the post ID column is indexed.

(I know this reply is delayed, I had some time AFK due to a stomach bug that does not respect software release cycles.)

#130 Image @peterwilsoncc
11 days ago

As we reach 130 posts on this ticket, I'd like to acknowledge that many of the contributors on this ticket have worked on WordPress for between one or two decades. Each have a different contribution history and can provide a huge knowledge base on that experience.

With a deeply technical discussion such as the one above, I just want to pause to acknowledge that we are all are working with positive intent towards the best solution.

I understand that Jon and Lynn are working on a Make/Core post to be published within the next 24 hours to get a summary of the above down in a much shorter version.

#131 Image @peterwilsoncc
11 days ago

A few weeks ago I posted this on a PR linked to this ticket:

dbDelta() runs prior to the upgrade routine so there is no opportunity to truncate before altering the table.

I just want to acknowledge that this was incorrect.

I've since discovered that pre_schema_upgrade() exists and can be used for this purpose. This will make it much easier to remove/truncate the ephemeral data from a new table if it needs to be altered in the future.

#132 follow-ups: Image @westonruter
10 days ago

Something that just came to mind: With the sync data populating post meta, will this cause problems for WordPress export/import? Would this greatly increase the size of WXR files? Wouldn't all of the sync data be completely irrelevant when importing on another site?

#133 in reply to: ↑ 132 Image @czarate
10 days ago

Replying to westonruter:

Something that just came to mind: With the sync data populating post meta, will this cause problems for WordPress export/import? Would this greatly increase the size of WXR files? Wouldn't all of the sync data be completely irrelevant when importing on another site?

Good call out @westonruter! It will definitely increase the size of WXR files. How much depends on how much RTC is used.

Since WXR imports can result in reassigned database IDs if there are conflicts, it's possible that imported sync data would be orphaned or be assigned to the wrong post. That won't result in data loss since RTC never trusts sync data over the actual post data, but it's an avoidable issue.

@griffbrad pointed out that we can simply add the sync post meta to the existing wxr_export_skip_postmeta filter:

https://github.com/WordPress/wordpress-develop/blob/4ac84565c5ce44b917ba88e6556c9434ff7d46a5/src/wp-admin/includes/export.php#L492-L498

If that seems agreeable, I can contribute a patch.

#134 in reply to: ↑ 132 Image @peterwilsoncc
10 days ago

Replying to westonruter:

Something that just came to mind: With the sync data populating post meta, will this cause problems for WordPress export/import? Would this greatly increase the size of WXR files? Wouldn't all of the sync data be completely irrelevant when importing on another site?

Reading the code I think registering the post type with can_export => false will prevent this but it will need some manual testing.

It would be good to open another ticket to investigate this and determine whether code changes are required.

Edit: #64964 opened to work on the export issue.

Last edited 10 days ago by peterwilsoncc (previous) (diff)

Image

@peterwilsoncc commented on PR #11348:


10 days ago
#135

I've pushed a few changes following reviews:

Following @aaronjorbin's review and suggestions in Slack

  • 8020652656201aba505cf15fecc8a57fa23e8f85 and some follow up commits: Introduces JIT meta invalidation to the register meta API so the JIT approach is only taken for specific meta data rather than all meta data. This is to help avoid unintentional consequences outside of RTC. As part of this change, I added _edit_lock for JIT invalidation. Discuss.
  • 6948cdeb9a1bdcdf756d8a63ce9f82fc508b9652: Reuse a variable instead of recalling time()

Following @chriszarate's review:

  • 25c571e043df8c0708ace1f3a4d7e01095d24bab: Restored a few changes in the original commit that were not related to switching to direct DB calls
  • f7e25c22d1683fc22f0489bfe255b2b3becb4787: expanded docs on granularity filter

Additionally:

#136 Image @peterwilsoncc
7 days ago

Following in person and slack discussions, I think there is broad agreement to go with the approach in PR#11348 for the 7.0 release:

Meta/Query API changes:

  • Just-in-time cache invalidation introduced to the post meta API
  • This introduces the option jit_cache_invalidation to register_meta() and wrapping functions
  • No changes to what can be queried in WP_Query
  • No changes to WP_Meta_Query
  • WP_Query will self flush it's cache on the next meta query following the modification of JIT invalidated meta data
  • @ellatrix I've taken this approach as it maintains back compat for third parties; reduces the need for NOT db queries; and bypasses a number edge cases concerning post types (all/any keywords, multiple post type queries) and avoids having to decide if meta queries should be dropped, replaced with WHERE 1 = 0, or an potentially more complex solution

JIT cache invalidation applied to:

  • _edit_lock (heartbeat API, all post types)
  • wp_sync_update_data (RTC, wp_sync_storage post type)

Sync storage updates:

  • Awareness: stored in transients to avoid thrashing post meta & WP_Query cache. As time limited transients, the alloptions cache is unaffected.
  • Awareness: granularity is reduced to 10 seconds, rounded up, to avoid updating the option/cache entry every time a HTTP request is made. Awareness lasts between 30 and 39 seconds as a result which no one using RTC will ever notice
  • Awareness: the transient is stored for a day to allow re-use when people return to a room on the basis that the more recently a room is used the more likely it is to be reused again soon.
  • Updates: adding an update uses the add_meta_data API
  • Updates: deleting updates for compaction still uses a direct DB query but triggers the need for JIT cache invalidation

I'd prefer to use the get_post_meta() and delete_post_meta() APIs rather than direct DB queries, I've asked Chris to help out with a tl;dr on the issue so I can consider potential solutions.

I've followed up the source code changes/questions by @jorbin, @czarate, @zieladam and @desrosj. I've redirected a question about the test suite to Chris. In addition to the reviewers, I've run the PR pas @griffbrad for a logic check.

I don't know that anyone will be 100% satisfied with using post meta but I think we have a solution that most people will be mostly happy with that avoids the cache thrashing in the description of this ticket.

The steps for verifying the fix in comment #111 still largely apply but you need to open the post in TWO tabs and make some changes now.

Last edited 7 days ago by peterwilsoncc (previous) (diff)

Image

This ticket was mentioned in Slack in #core-committers by zieladam. View the logs.


6 days ago

Image

This ticket was mentioned in Slack in #core-committers by dmsnell. View the logs.


6 days ago

Image

This ticket was mentioned in Slack in #core by joefusco. View the logs.


4 days ago

Image

This ticket was mentioned in PR #11427 on WordPress/wordpress-develop by jonnynews.


4 days ago
#140

  • Introduced wp_cache_set_post_meta_last_changed to manage the 'post_meta' cache group.
  • Updated relevant functions and queries to include 'post_meta' in cache invalidation logic where applicable.
  • Ensured proper salting of cache keys when post meta data is involved in queries.

Trac ticket:

## Use of AI Tools

#141 Image @spacedmonkey
4 days ago

Sorry if I’m a bit late to this discussion. Since I worked on the WP_Query caching, I wanted to add a thought.

If the concern is post meta invalidating caches, I think there’s a more targeted approach we could take.

At the moment, we invalidate the entire post cache group whenever post meta is updated, which has always felt quite wasteful. In practice, post meta changes only matter for queries that actually join against the post meta table.

Instead, we could introduce a separate “last changed” value specifically for post meta (and potentially other meta types in future). We would then only include this in the cache key when the query performs a join on the post meta table. For all other queries, the existing post “last changed” value should remain sufficient.

I put together POC PR of this idea here.

Would be interested to hear your thoughts on this.

Image

@dmonad commented on PR #11256:


3 days ago
#142

I'm not very familiar with WordPress conventions, but I want to share my perspective as the Yjs author.

Yjs uses 53bit uint client-ids. I think BIGINT UNSIGNED would be a more appropriate, more efficient, choice. If you have to use char for some reason, then I'd go with char(16).

Also note that client-ids in Yjs are reusable (even by different users). They are assigned randomly, and might change during a session. They are part of the Yjs-algorithm, and probably should be kept private. If you want to keep track of who created content, used-id is much more appropriate. client-id is most likely not something you need to store in the table.

Yjs encodes data using binary encoding. I assume you want to base64-encode Yjs updates into data longtext NOT NULL. Just note that base64 encoding has a 33% storage overhead compared to binary blobs. I believe that LONGBLOB is a more appropriate choice here.

Yjs encodings

Yjs has different encoding methods for the same data. You are currently using v1 encoding (Y.encodeStateAsUpdate). v2 encoding has a better compression (Y.encodeStateAsUpdateV2). I will add more encoding strategies in the future. It might make sense for your update-table to be aware of encoding-strategies. A encoding field would allow you to use different encoding strategies in the future, while being backwards compatible.

Gutenberg bindings
Currently, Gutenberg is implementing a 1-1 mapping of HTML<->Y.XmlElements. In the future, you might want to improve this binding strategy. For example, you might want to use a flatter binding strategy in the future to get better diffs, when splitting blocks. It might make sense that your update-table is aware of the binding "version". A binding_version: 'gutenberg:v1' | 'gutenberg:v2' field would allow you to add different "binding approaches" in the future, while being backwards compatible.

Image

@czarate commented on PR #11256:


3 days ago
#143

Hi @dmonad! Thanks for this info!

Yjs uses 53bit uint client-ids. I think BIGINT UNSIGNED would be a more appropriate, more efficient, choice. If you have to use char for some reason, then I'd go with char(16).

I believe the goal here was to be flexible and accommodate other use cases and "clients" outside of Yjs, but I'll let others speak to that.

They are assigned randomly, and might change during a session.

Could you describe the scenarios by which they might change during a session?

You are currently using v1 encoding (Y.encodeStateAsUpdate). v2 encoding has a better compression (Y.encodeStateAsUpdateV2).

We switched to Y.encodeStateAsUpdateV2 early on and use v2 methods consistently now.

It might make sense that your update-table is aware of the binding "version". A binding_version: 'gutenberg:v1' | 'gutenberg:v2' field would allow you to add different "binding approaches" in the future, while being backwards compatible.

Good suggestion. We have discussed namespacing the type fields in order to implement versioning, but a separate version column is not a bad idea either. For the current implementation, the version could just be null.

#144 Image @spacedmonkey
3 days ago

Sorry for the ping, but I would like eyes on my PR. @davidbaumwald @peterwilsoncc @westonruter @zieladam

TDLR of the PR, it keep the last changed group changing on clean post cache / update / add delete post meta, to maintain backwards compatiblity with third parties code that is use this cache key for invalidation / salts. But with the core base use two new last changed caches, one for post-queries and one for post meta.

So to be clear.

  • Post - last changed is updated when both posts and meta are updated.
  • Post-queries - last changed is updated when posts are updated.
  • Post-meta - last changed is updated when meta is updated.

If this change works for posts, we should roll this out to other query classes.

#145 Image @peterwilsoncc
3 days ago

@spacedmonkey The 7.0 delay is to allow time for a custom table to be used so we won't need to make any immediate and last minute changes to the Query or Meta APIs.

FWIW, I've been silently pondering x_meta last changed values as something to consider in a future release, are you able to open a ticket?

Image

@westonruter commented on PR #11427:


3 days ago
#146

(I won't be able to review this likely before May.)

Image

@czarate commented on PR #11256:


2 days ago
#147

They are assigned randomly, and might change during a session.

Could you describe the scenarios by which they might change during a session?

Following up here (while still welcoming a reply from @dmonad), it's my understanding that this would only happen if the client ID were explicitly overwritten by application (Gutenberg) code or if the Yjs document was destroyed and recreated within a session. Neither of which we do.

Image

@dmonad commented on PR #11256:


34 hours ago
#148

Could you describe the scenarios by which they might change during a session?

When a client randomly creates a client-id that was used before, which can happen in rare cases. Yjs detects when a client-id is reused and creates a new one.

Generating (and using) client-ids safely for CRDTs is a complex topic.

  • Yjs v13 supports 53bit client-ids, but generates 32bit integers for compatibility with other Yjs implementations.
  • Yjs v14 generates 53bit client-ids by default
  • Yjs v13 (32bit): After 9300 sessions, there is a 1% chance that a client creates a client-id that was used before (triggering the creation of a new client-id).
  • Yjs v14: recreation is extremely unlikely

I believe the goal here was to be flexible and accommodate other use cases and "clients" outside of Yjs, but I'll let others speak to that.

I'd note that as a red flag, because I don't think it makes sense for WP to interact with Yjs' client-ids. I think of them as CRDT-internals. Using user-id, or session-id, would be more appropriate.

We switched to Y.encodeStateAsUpdateV2 early on and use v2 methods consistently now.

Fair, but please keep in mind that I plan to add more encoding approaches in the future. For example, future encodings *might* also include attributions (who did what/when). Without an "encoding format", you are bound to the specific v2 encoding. This is not a big issue, as I plan to support it in the future.

Image

@dmonad commented on PR #11256:


34 hours ago
#149

Out of curiosity, what is the reason for sticking with data longtext instead of longblob?

Note: See TracTickets for help on using tickets.