Workfriendly: Yet Another Issue

WorkFriendly LogoBack in November of last year, I wrote an article about Workfriendly, calling it an “accidental scraper” and accusing the site of allowing search engines to index pages containing scraped content.

The site, which is simply a script that modifies other sites to look like a document in Microsoft Word, so that one can surf the Web at work without raising suspicion, has nearly a quarter of a million URLs referenced in Google, even though only one page, the home page, contains original content.

However, I recently discovered that Workfriendly has another issue with it, one that causes, in some cases, both users and the search engines to seek out nonexistant URLs, causing 404 errors in very large numbers.

Though it is a problem caused by Workfriendly, it is one that Webmasters and bloggers need to take action to correct if they are vulnerable. Otherwise, the search engines could be steered toward hundreds of non-working URLs on your site, potentially hurting your ranking in them.

Discovering the Problem

workfriendlysucks2.jpgI discovered the problem with Workfriendly over the weekend by accident. I logged into my Google Webmaster Tools account to check on any errors I had and was stunned to find over 150 file not found errors.

WordPress typically does a pretty good job avoiding file not found errors so to discover so many on my site, especially with no other errors found, was surprising.

Thinking that, perhaps, my recent update had caused an issue with my permalinks, I looked at the errors themselves. One was caused by me changing the date on a post, another was a server error where the URL worked fine, but the other 149 pointed to a directory that does not and has never existed on this server “/browse/Office2003Blue/”.

workfriendlysucks3_2.png

I remembered that Workfriendly used a similar link structure when you browsed the Web through it. I hopped onto the site and pulled up Plagiarism Today and watched as Workfriendly pulled up the site successfully. Clearly, the ban I had put in place a few months ago had stopped working, likely due to the plugin I was using not being compatible with newer version of WordPress.

workfriendlysuck7.jpgAfter pulling up Plagiarism Today in Workfriedly, I hovered my mouse over one of the links and looked at the URL, indeed, it was pointing to URLs on this server in the non-existant “browse” directory. Clicking the link resulted in chaos in Workfriendly and, in most cases, led to the site loading up without Workfriendly’s obfuscation.

I immediately set out to block Workfriendly, this time using a hand-coded .htaccess block, but not before trying to figure out what was causing the problem.

Understanding the Issue

What made the problem perplexing was that it seemed to only be this site that was having the issue. Other sites I tested with Workfriendly worked fine.

However, after I looked at the source code for the page that Workfriendly created, the problem became almost immediately clear.

Plagiarism Today uses a “base” meta tag. It is a tag used to tell search engines and Web browsers what the “base” URL of your site is so that, when you use relative links (links that do not begin with an “https://”), the browser knows what URL you are pointing to.

It is a good practice for SEO reasons and to help with preventing 302 hijacking. Still, most sites do not have one and, in many cases, it isn’t necessary.

The problem was that Workfriendly, despite having manipulated all of the links on my site, was using relative links for everything. Rather than saying “https://www.workfriendly.net/browse/…” the links simply said “/browse/…”.

When it was combined with the base tag by the browser, that converted all of the links to “https://plagiarismtoday.com/browse/…”, a link that does not exist.

The combination of the base tag and Workfriendly’s use of relative links was causing the site to throw back URLs that did not exist and, due to the poor use of robots.txt, causing the search engines to pick up those bad links as well.

An Inconsiderate Script

My issue with Workfriendly has never been the service itself. Though some could argue that it creates a derivative work of the sites it processes, since the works are never saved, but are rather created dynamically, it is a difficult case to make.

However, more to the point, I am not upset about sites that want to remix or alter the site to make it easier to read. I would not oppose a version better suited for the visually impaired, for mobile browsers or other formats as needed, so long as the site showed basic respect for the content it was displaying.

And that is the problem with Workfriendly. The service shows no consideration for the Webmasters whose content it uses.

For one, the site allows the search engines to index the scraped pages, even though the pages do not exist and are, instead, dynamically-generated.

Second, sloppy programming on the site causes it to generate artificial 404 errors that could hurt Webmasters when dealing with the search engines. Fortunately though, since the bad links are on an external site, they likely won’t have much impact.

However, if Workfriendly had simply used a correct link format, including the “https://www.workfriendly.net” before each link or stripping out the Base tag, the issue would not be a problem at all.

But what is perhaps strangest of all is that Workfriendly offers you a script that you can put on your site to direct your visitors to their version of your site. However, in addition to letting your visitors use the Workfriendly service, you may be helping the search engines find your content in their links.

It seems unlikely that is worth the trade off.

Conclusions

workfriendlysucks5-1.jpgPersonally, I decided it was time to be done with Workfriendly. I edited my .htaccess file and have banned the server from accessing this site. So far it is the only IP to be completely banned from this domain. If you attempt to access the site from Workfriendly, you will get the message displayed to the right.

If anyone is looking for the code I added to my .htaccess file, I simply put this before any of my WordPress code:

order allow,deny
deny from 66.226.27.21
allow from all

This certainly isn’t the type of steps I wanted to take, but it was I felt I was forced to do and, sadly, what I have to encourage others to look at doing to.

But the problem is that, in their bid to create something simple and fun, the creators of Workfriendly made something that poses a real danger to Webmasters and bloggers. Though simple changes to the system could remedy these problems easily, the authors have either neglected or refused to do so.

The result, on this site at least, is that Workfriendly is banned. I have attempted to contact the creators several times in the past but have never received a response. Considering all of the attention that has been paid to scraping issue, it seems that either the creators are ignoring the criticism, or have abandoned the project.

Either way, right now Workfriendly is just another problem for Webmasters and bloggers to worry about.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free