Will there be any additional documentation or insight as to how the Medium and High filters are applied to streams? (Or is that Secret Sauce stuff?) ;)
Looks like a great addition to the API.
One question: How do these extra filters apply when hitting a streaming API limit?
Is this extra filtering (both on language and filter_level) applied before checking the number of allowed tweets, or after?
an example: Let's say I only want english tweets, now miss (get rate limited on) 1 in 2 tweets, and half the tweets I get are english. Should, with the new filter for english, I now get all tweets? Or will I still get rate limited, and just get half the tweets?
These limits will be applied before the Tweet volume rate limit is applied to a filtered stream. So in your example you should be able to get all the Tweets you're looking for.
If you filter on medium/high are the Tweets delayed until they hit some threshold to be considered medium/high or do they get "ranked" as soon as they are posted?
Also any chance to get this data over on the REST side, could be useful.
If you're asking whether adding filter_level to a stream will delay the stream compared to a stream without that parameter, the answer is no - there won't be a delay, as even filter_level-less streams will get the attribute in the Tweet body.
There are no current plans for delivering this field via REST, sorry :(
I'd also really love to see this on the REST side. Is there any possibility of movement in that direction? Or it's really locked in as a Streaming API special feature / the point is to encourage more adoption of Streaming API or something like this?
Also (sorry for barrage of questions!), is filter_level going to be supported in Site Streams? Assume so but just want to double-check...
The lang tag should actually match the metadata/iso_language_code tag which appears in search. We will probably deprecate the iso_language_code further down the road.
Unfortunately this isn't the kind of thing which can benefit from crowdsourcing. I know that the team is interested in adding new languages in the future, but we don't have any announcements about which languages or when they'll be added.
The new metadata is meant to allow consumers of the streaming api to highlight a subset of tweets from a stream for display purposes. We don't stream promoted Tweet info, so it is unrelated to this change.
Not being overly familiar with much "techy" language, I'd also like some straightforward plain English clarification on this. I just read this post on Social Media Today, which infers you're basically bringing in your own version of Facebook's edgerank and that tweets will no longer be displayed chronologically. I take it that's not true?
This is only a developer-facing change. We're adding a piece of metadata to streamed Tweets. No changes will be made to twitter.com or any of our clients, and developers must choose to use this field if they want to filter on it.
Could you add some shallow information at least in the response to retweets_of_me? I believe that, most of the time, all we really want to know about retweets are the following:
- the user ID of the people who retweeted
- the ID of each retweet
- the timestamp of each retweet
As it stands, the current API requires that I make an extra API call to retrieve that kind of information for each tweet that got retweeted.
This is just more data that third parties can consume and do with as they want. It's not generally a user-facing feature, though some third parties may leverage the feature in a user-facing way. How they use the data is mostly up to them.
You can look at the metadata/iso_language_code value returned by the search API to get an idea of how reliable this field is. The languages are the same as those you can choose to use in an "advanced search" on twitter.com.
my twiiter app for reading tweets is not working since 20th Feb 2013. Is this some how related to these changes? If yes what should be done at app level.
It looks like the 1.1 streaming API always sets the filter_level to "medium" in the pubic and user streams. When can we expect to see filter_level "none" and "low" in these streams?
I've noticed that the streaming API rarely returns data about language for retweets. You may check this speadsheet and you'll see what I mean http://bit.ly/10QIHBd. If your database is set not to allow "null" as a value, maybe RTs are actually returned by the API but ignored by your database. Just a thought.
I've compared twitter language detection with human language detection for tweets we collect from the streaming API. Out of 86 tweets in English, French and Japanese, Twitter has been able to detect the language correctly 44 times for an average precision of 51%. You may check the detailed report here http://bit.ly/10QIHBd.
I'd like to hear from the twitter team on these results.
Seems to be a problem with getting the language from retweets.
If you do not count the retweets, twitter is more often guessing the language correctly.
Is there any background information, what kind of algorithm Twitter is using for determining the language?
Let Twitter serve all Tweets for the hashtag and sort the languages on your end.
Sorting the arabic tweets out should be fairly easy with a RegEx that looks for arabic characters.
If the tweet is composed mainly of characters from the arabic "unicode language plane", it is probably arabic.
For french it is a bit more difficult, since it uses the nearly the same alphabet as other languages that using a latin alphabet or a derivation thereof.
Replies
This is great, thanks Twitter team.
Will there be any additional documentation or insight as to how the Medium and High filters are applied to streams? (Or is that Secret Sauce stuff?) ;)
Do you mean how Tweets are tagged, or which Tweets you'll receive if you set the filter_level parameter?
Looks like a great addition to the API.
One question: How do these extra filters apply when hitting a streaming API limit?
Is this extra filtering (both on language and filter_level) applied before checking the number of allowed tweets, or after?
an example: Let's say I only want english tweets, now miss (get rate limited on) 1 in 2 tweets, and half the tweets I get are english. Should, with the new filter for english, I now get all tweets? Or will I still get rate limited, and just get half the tweets?
These limits will be applied before the Tweet volume rate limit is applied to a filtered stream. So in your example you should be able to get all the Tweets you're looking for.
If you filter on medium/high are the Tweets delayed until they hit some threshold to be considered medium/high or do they get "ranked" as soon as they are posted?
Also any chance to get this data over on the REST side, could be useful.
If you're asking whether adding filter_level to a stream will delay the stream compared to a stream without that parameter, the answer is no - there won't be a delay, as even filter_level-less streams will get the attribute in the Tweet body.
There are no current plans for delivering this field via REST, sorry :(
I'd also really love to see this on the REST side. Is there any possibility of movement in that direction? Or it's really locked in as a Streaming API special feature / the point is to encourage more adoption of Streaming API or something like this?
Also (sorry for barrage of questions!), is
filter_levelgoing to be supported in Site Streams? Assume so but just want to double-check...+1 for adding filter metadata to the REST side!
So what's the difference between the new lang tag and the current metadata/iso_language_code tag? Will the current tag be disappearing?
The lang tag should actually match the metadata/iso_language_code tag which appears in search. We will probably deprecate the iso_language_code further down the road.
+1... What can we do to help in including other languages? (Catalan, for instance).
Unfortunately this isn't the kind of thing which can benefit from crowdsourcing. I know that the team is interested in adding new languages in the future, but we don't have any announcements about which languages or when they'll be added.
Hi Arne,
what has been the basic motivation behing the new metadata and how will „promoted tweets“ be included/regarded in such a ranking?
Kind Regards from Cologne
Christian
The new metadata is meant to allow consumers of the streaming api to highlight a subset of tweets from a stream for display purposes. We don't stream promoted Tweet info, so it is unrelated to this change.
This new addition will not be filtering tweets on Timeline, like an edge rank for Twitter ? It's just for streaming API and search results ?
The new attribute will only be present on streaming, not search. No filtering will be done unless the application adds the filter_level parameter.
Not being overly familiar with much "techy" language, I'd also like some straightforward plain English clarification on this. I just read this post on Social Media Today, which infers you're basically bringing in your own version of Facebook's edgerank and that tweets will no longer be displayed chronologically. I take it that's not true?
http://socialmediatoday.com/slrupert/1243696/will-mark-decline-small-business-success-twitter
This is only a developer-facing change. We're adding a piece of metadata to streamed Tweets. No changes will be made to twitter.com or any of our clients, and developers must choose to use this field if they want to filter on it.
Great news, is there a schedule for adding new supported languages? Someone has already mentioned Catalan, what about Cymraeg (Welsh)?
I replied above. Unfortunately I don't have anything to announce around new languages.
Glad to see Nepali on the language list :)
Sorry I am not exactly a 'techy'. Just to clarify does this only apply to Twitter applications or will this be the case for Twiiter.com as well?
Does this mean that Tweets with a higher value will show above others or will chronological order still apply?
Could you add some shallow information at least in the response to retweets_of_me? I believe that, most of the time, all we really want to know about retweets are the following:
- the user ID of the people who retweeted
- the ID of each retweet
- the timestamp of each retweet
As it stands, the current API requires that I make an extra API call to retrieve that kind of information for each tweet that got retweeted.
Hi everyone,
The Streaming API is beginning to stream the new filter_level field now. You can also use the filter_level parameter while filtering. Enjoy!
Sorry I am not exactly a 'techy'. Just to clarify does this only apply to Twitter applications?
Does this mean that Tweets with a higher value will show above others or will chronological order still apply?
This is just more data that third parties can consume and do with as they want. It's not generally a user-facing feature, though some third parties may leverage the feature in a user-facing way. How they use the data is mostly up to them.
Ok then I think I get it. Thanks for the explanation, it's really appreciated.
How can we know how reliable the language field is? And which languages will it support?
You can look at the metadata/iso_language_code value returned by the search API to get an idea of how reliable this field is. The languages are the same as those you can choose to use in an "advanced search" on twitter.com.
my twiiter app for reading tweets is not working since 20th Feb 2013. Is this some how related to these changes? If yes what should be done at app level.
It's unlikely, the change rolled out on the 21st.
If you can provide more information about the error you see, please start a new discussion thread.
my twitter site has gone all weird i cannot post any tweets after pressing tweet button. help?
Please contact @support or visit support.twitter.com
It looks like the 1.1 streaming API always sets the filter_level to "medium" in the pubic and user streams. When can we expect to see filter_level "none" and "low" in these streams?
is that true? the documentation says that default is 'none'
I need ALL TWEETS of my hashtag: #quellocheimieigenitorinonsanno ( 2012 - 11 - 14 )
How can i do?
help me, please.
P.S. it's for my future.
Hello.
When can we expect the 'lang' param?
Is there any concrete date for it?
Any update regarding addition of the lang param?
You should see "lang" in the REST API starting today, still working on getting it into streams. We'll post an update once we have an official date.
Hi, I'm very happy to say that we've finally shipped "lang" for streamed Tweets. Thank you for your patience!
Great work on the lang parameter; it does work well. However, specifying it does stop any retweets from returning on the stream. Any idea why this is?
I've noticed that the streaming API rarely returns data about language for retweets. You may check this speadsheet and you'll see what I mean http://bit.ly/10QIHBd. If your database is set not to allow "null" as a value, maybe RTs are actually returned by the API but ignored by your database. Just a thought.
sitos
I've compared twitter language detection with human language detection for tweets we collect from the streaming API. Out of 86 tweets in English, French and Japanese, Twitter has been able to detect the language correctly 44 times for an average precision of 51%. You may check the detailed report here http://bit.ly/10QIHBd.
I'd like to hear from the twitter team on these results.
Seems to be a problem with getting the language from retweets.
If you do not count the retweets, twitter is more often guessing the language correctly.
Is there any background information, what kind of algorithm Twitter is using for determining the language?
please how i can collect tweets from the streaming api in french and arabic and they have the same hashtag
thnks
Let Twitter serve all Tweets for the hashtag and sort the languages on your end.
Sorting the arabic tweets out should be fairly easy with a RegEx that looks for arabic characters.
If the tweet is composed mainly of characters from the arabic "unicode language plane", it is probably arabic.
For french it is a bit more difficult, since it uses the nearly the same alphabet as other languages that using a latin alphabet or a derivation thereof.
Hello! Could you please explain how can I possibly get the filter_level attribute using the API of the Status interface? There is no appropriate getter for that attribute
http://twitter4j.org/oldjavadocs/3.0.4-SNAPSHOT/twitter4j/Status.html