Rise of the Twitter Scrapers

Jonathan BaileyFebruary 13, 2009

4 minutes read

It was an inevitability. As Twitter has grown in popularity, both as a networking and as a promotion tool, it has become an increasingly enticing target for spammers.

To date, most of the Twitter spam has been of the auto-follow variety. A spammer sets up an account, links it with a site they want to promote and the proceeds to follow hundreds, if not thousands, of strangers. Those strangers not only get the follow notification, thus turning it into a form of email spam, but also are forced to click the link to the Twitter account to determine if it is one they want to follow back, thus exposing them to the advertisements.

As frustrating as these accounts can be, for the most part, these spammers have had little interest in creating a legitimate-looking Twitter presence. They typically post only a few tweets, usually filled with links to the destination site, and they attract almost no followers.

However, a new breed of Twitter users seems to be changing that. These users are creating Twitter accounts that aren’t spammers in the traditional sense, but are actually Twitter scrapers. These accounts grab results from Twitter search feeds and republish them.

The question, however, is whether these new bots are legitimate forms of Twitter expression or a new form of spam that needs to be stopped. Also, if it does need to be stopped, how can it be done?

From Haikus to Shut Ups

haiku-default

If you mention the word “Haiku” in your tweet. It is almost certainly going to wind up on the @haikutwaiku account. It doesn’t matter if you’re posting your latest haiku creation, discussing haikus or just using a hashtag with Haiku in it, the account picks it up and, currently, does not attribute the tweet back nor does it indicate that it is a retweet.

Every tweet in the account is, originally, from another user. For example, this tweet on the @haikutwaiku account is actually from @jennar. Likewise, this @haikutwaiku tweet is from @CobWebsStir.

The @haikutwaiku account is both very active, with nearly 200 tweets per day, and relatively popular, with over 700 followers as of this writing.

Twitter users, for the most part, seem to either tolerate or be oblivious to the copying of the @haikutwaiku account. Most of the discussion with the account has been positive. However, a few Twitter users, such as @timtfj, have expressed displeasure.

This isn’t to say that all Twitter scrapers are plagiarizing their tweets. Another scraper, @shutupmeg targets tweets with the keyword “shut up” and give attribution to the tweets, though it uses “(@username)” rather than the “RT @username” format.

However, the response to @shutupmeg has been much more hostile. This may be because the attribution informs more Twitter users that their tweets are being copied or the keyword in question may attract a more hostile kind of Twitter user.

Either way, these are just two of the wide variety of Twitter bots that are scraping search results and republishing them in a new account. It seems likely that the controversy has just begun.

Copyright, Plagiarism and More

The next obvious question is whether or not any of these scrapers can be accused of copyright infringement, as many spam blogs can? As I pointed out during the Tweetbacks controversy, most tweets don’t rise up to the requisite level of creativity needed for copyright protection. As a result, it is likely that these services don’t raise any direct copyright issues.

However, the @haikutwaiku service may be an exception. Since it targets haiku poetry, a format of literature that is both tweetable and has been ruled protected in the past, it is easy to see how one could reach the conclusion that its activity is an infringement, even though there may still be fair use issues.

Beyond the copyright issues, it is unclear what could be done to stop Twitter scrapers if it were so desired. The current terms of use at Twitter make no mention of auto-posting bots, something that would have likely outlawed WordPress plugins and other tools used by bloggers for getting posts into their Twitter stream.

The end result is that these scraper bots may be here to stay and, unless Twitter users are able to motivate Twitter itself to take some kind of special action, it doesn’t seem likely to change.

Conclusions

Though Twitter scraping is likely annoying, especially when it is plagiarized, the nature of Twitter works against resolving these issues through traditional means. Copyright claims on tweets will be dubious and any Twitter rules that would target these bots would likely ensnare other, more accepted uses of the service.

The real question is how will Twitter users react as these bots become more common? Right now the response is rather mixed, some users expressing outrage and blocking the bots in question, others are tolerating or even enjoying their presence.

The real test will be how these bots are accepted after the novelty has worn off and after spammers begin to use them for more devious purposes. Right now the bots are fairly benign, linking only back to themselves or to nothing at all. Once they are used for promotion of sites or products, attitudes will likely change.

In short, we’ve only seen the very beginning of both the Twitter scrapers and the battle over them. Over the next few months, this will likely be a space where things get very interesting, very quick.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free