Handle article:author meta tag. Fixes #938#942
Conversation
| @@ -0,0 +1,10 @@ | |||
| { | |||
| "title": "If You Can Picture A Tarot Card, It's Because of These 3 People", | |||
| "byline": "Laura June Topolsky", | |||
There was a problem hiding this comment.
This is the key line. Without this change, this byline would contain something like "Laura June Topolsky July 10, 2015".
| // property is a space-separated list of values | ||
| var propertyPattern = | ||
| /\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|creator|description|published_time|title|site_name)\s*/gi; | ||
| /\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|article:author|creator|description|published_time|title|site_name)\s*/gi; |
There was a problem hiding this comment.
I'm sure it's me but why is this not already being matched? The first capturing subgroup has article, then there's optional whitespace followed by a colon followed by optional whitespace, and then this bit has author. After this patch this will also work for e.g. dc:article:author (and article:article:author etc. etc.) but that doesn't seem to be the intent and isn't what's in the testcase... so I'm a bit lost! Sorry for being obtuse.
There was a problem hiding this comment.
Trying locally, the new test passes without this part of the PR. It does fail the BBC test for me locally, where it now finds a byline that it did not find before ("BBC News"). Fixing that, this seems to pass tests without this change, so I'll just merge without this change? If I've missed something, do let me know.
This ports "Handle article:author meta tag" (mozilla/readability#942)
Fixes #938