close
Skip to content

Handle article:author meta tag. Fixes #938#942

Merged
gijsk merged 2 commits intomozilla:mainfrom
danielnixon:938
Jan 2, 2025
Merged

Handle article:author meta tag. Fixes #938#942
gijsk merged 2 commits intomozilla:mainfrom
danielnixon:938

Conversation

@danielnixon
Copy link
Copy Markdown
Contributor

@danielnixon danielnixon commented Jan 1, 2025

Fixes #938

@@ -0,0 +1,10 @@
{
"title": "If You Can Picture A Tarot Card, It's Because of These 3 People",
"byline": "Laura June Topolsky",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key line. Without this change, this byline would contain something like "Laura June Topolsky July 10, 2015".

Comment thread Readability.js Outdated
// property is a space-separated list of values
var propertyPattern =
/\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|creator|description|published_time|title|site_name)\s*/gi;
/\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|article:author|creator|description|published_time|title|site_name)\s*/gi;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's me but why is this not already being matched? The first capturing subgroup has article, then there's optional whitespace followed by a colon followed by optional whitespace, and then this bit has author. After this patch this will also work for e.g. dc:article:author (and article:article:author etc. etc.) but that doesn't seem to be the intent and isn't what's in the testcase... so I'm a bit lost! Sorry for being obtuse.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying locally, the new test passes without this part of the PR. It does fail the BBC test for me locally, where it now finds a byline that it did not find before ("BBC News"). Fixing that, this seems to pass tests without this change, so I'll just merge without this change? If I've missed something, do let me know.

Copy link
Copy Markdown
Contributor

@gijsk gijsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop!

@gijsk gijsk merged commit b6ff1b6 into mozilla:main Jan 2, 2025
mislav added a commit to mislav/go-readability that referenced this pull request Jun 23, 2025
This ports "Handle article:author meta tag" (mozilla/readability#942)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suggestion: consider article:author meta tag as a source of author name metadata

2 participants