Optionally keep original title headers for main content extraction accuracy by mcPear · Pull Request #1006 · mozilla/readability

mcPear · 2026-04-17T14:21:48Z

Summary

Reader extraction currently rewrites all in-article h1 elements to h2 so the article title can remain the sole top-level heading in the reader UI. Moreover, removes the first similar heading spotted after the title. These normalizations improve classic “reader mode” presentation but weaken the semantic outline of the page: crawlers, SEO tooling, and systems that infer structure from HTML (including retrieval and “reverse engineering” of how a page is organized) rely on stable heading levels that match the publisher’s markup.

This change preserves the original heading tag names and levels in the extracted content wherever we are not explicitly removing noise, so the serialized article HTML stays closer to the source document’s hierarchy. All that is gated behind an option.

What changes (high level)

Stop blanket h1 → h2 replacement in article content
Stop duplicate-title header removal
Add unit tests

chore: optionally keep original title headers

1d7e86c

mcPear changed the title ~~chore: optionally keep original title headers~~ Optionally keep original title headers for main content extraction accuracy Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally keep original title headers for main content extraction accuracy#1006

Optionally keep original title headers for main content extraction accuracy#1006
mcPear wants to merge 1 commit intomozilla:mainfrom
surferseo:mg/title

mcPear commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcPear commented Apr 17, 2026

Summary

What changes (high level)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant