close
Skip to content

feat: add js_trim() and mb_trim() compat#9519

Open
USERSATOSHI wants to merge 8 commits intoWordPress:trunkfrom
USERSATOSHI:try/add-js-trim
Open

feat: add js_trim() and mb_trim() compat#9519
USERSATOSHI wants to merge 8 commits intoWordPress:trunkfrom
USERSATOSHI:try/add-js-trim

Conversation

@USERSATOSHI
Copy link
Copy Markdown

PHP’s trim() function, by default, only strips a limited set of ASCII whitespace characters, and mb_trim(), introduced in PHP 8.4, does not behave identically to JavaScript’s String.prototype.trim().

This PR implements js_trim(), a PHP function that replicates JavaScript’s String.prototype.trim() behavior.

It works by defining a set of $js_trimmables characters, which are passed to mb_trim() with UTF-8 encoding.

In addition, this PR adds a polyfill for mb_trim() in compat.php to support PHP versions below 8.4 with unit tests for both js_trim() and mb_trim()

Trac ticket: https://core.trac.wordpress.org/ticket/63804


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props tusharbharti.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link
Copy Markdown

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell requested review from dmsnell and removed request for dmsnell August 31, 2025 05:29
@dmsnell dmsnell self-assigned this Aug 31, 2025
}

if ( 'UTF-8' !== $encoding ) {
$characters = mb_convert_encoding( $characters, 'UTF-8', $encoding );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will intentionally corrupt the list of characters in every case that the code runs. is the $characters string not already UTF-8 by construction in the PH source code?

so if we convert it from anything else we’ll be telling PHP to misunderstand the string and double-convert it?

I would imagine that if the $encoding is ISO-8859-1, for instance, that we would get something like � instead of NARROW NO-BREAK SPACE U+202F.

Copy link
Copy Markdown
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@USERSATOSHI although this looks sound from the function-call arguments, I would like to hear your thoughts on some of the ways it could interact with actual site data and the encodings of strings coming into it.

there could be an argument for requiring that all incoming strings be converted into UTF-8 before reaching this function.


if ( 'UTF-8' !== $encoding ) {
$characters = mb_convert_encoding( $characters, 'UTF-8', $encoding );
$str = mb_convert_encoding( $str, 'UTF-8', $encoding );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is a heavy lifter, and I generally encourage folks to disregard content if it’s not UTF-8 because the conversion here is more than likely to introduce corruption.

it may be less risky to check if the string is valid in its own encoding first…

if (
	! is_utf8_charset( $encoding ) &&
	mb_check_encoding( $str, $encoding )
) {
	$str = mb_convert_encoding( $str, 'UTF-8', $encoding );
} else {
	// REJECT!
}

but even in this case we run a large risk because most strings will validate as any of the single-byte encodings likely to be set on a real site, if not UTF-8.

the primary source of non-UTF-8 is from legacy database tables, and it’s best to convert encodings at the point of demarcation when reading from the database. any other string sent here is almost certainly going to be in a different encoding than what is set for $encoding

also, I would guess that there is an extremely low likelihood that mb_internal_encoding() matches a site’s blog_charset or the encoding of the incoming text unless they are all UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants