Automatic Paraphrasing: A Problem for Academia?

There's no substitution for the real thing...

Springer Open LogoA recent paper in the International Journal for Educational Integrity looked at the use of what they called “internet based paraphrasing tools”.

According to authors Ann M. Rogerson and Grace McCarthy, such tools were successful in at least partially defeating plagiarism detection software and were a great concern, in large part because traditional tools used to find unoriginal text will struggle with such manipulated text.

Furthermore, the authors said that such tools encourage or enable poor writing skills, in particular when it comes to paraphrasing. After all, the kind of paraphrasing these automated tools provide are a poor substitute for actual paraphrased text.

They went on to say that, as students increasingly use and trust these tools, they may find services lacking, creating incoherent sentences that, while not directly matching, are almost unreadable.

The study recently gained the attention of Inside Higher Ed and that has caused many in academia to take notice of these tools.

However, as the original study indicated, there is nothing new about this technology. I’ve been writing about it here for nearly a dozen years.

What is new is that they are increasingly being marketed to and used by students that are hoping to fool plagiarism detection software, a use that they are ill-suited for.

Understanding Spinning/Automatic Paraphrasing

Article spinning, to put it in its most basic format, is an automated process that takes a piece of text content and swaps out words for close synonyms.

This can turn a simple sentence such as:

The cat rode on the boat.

Into any of the following:

The feline used the ship.
The tabby floated on the trawler.
The calico sailed on the vessel.

The problem be very clear. None of the spun sentences are as clear as the original and some, such as the last, barely make sense.

However, the power of this idea isn’t that it produces great text, it’s that it produces a great deal of seemingly original text. For example, that one sentence above, using a very small thesaurus and only a few substitutions, can be converted into 27 different sentences trivially.

This technique was not developed for the purpose of defeating plagiarism checkers. Instead, it was developed by spammers who sought to quickly generate a large amount of “original” content for search engines.

Thus, article spinning was born and it has been a topic on this site for more than a decade.

I first wrote about article spinning back in December of 2005. At that time I referred to it as “synonymized plagiarism” but by 2007, when I looking at the issue of such plagiarism in conjunction with content scraping, the term “spinning” was beginning to stick. In 2009 it came up again with Twitter spammers though, by 2014, it had largely lost favor with spammers due to Google getting better and better at detecting spun content.

However, what’s new with article spinning isn’t the technology it’s the way it is being used. With fewer spammers taking it up, some of the developers have begun repackaging the technology as “automatic paraphrasing” and pitching it to authors, including students, as a way to quickly paraphrase.

Unfortunately, what it does isn’t actually paraphrasing. Instead, it’s a crude substitution that, on the whole, produces terrible results.

Testing Automatic Paraphrasing

Typewriter ImageSince the technology behind “automatic paraphrasing” is similar, if not identical, to article spinning, it’s safe to assume that the quality of the spun content will be sub-par. However, I decided to put several of the free ones to the test.

As my test content, I chose the first paragraph from my recent article about academic plagiarism cases spilling into courts of law.

That original paragraph reads:

When students are caught plagiarizing, whether they are in high school, college or doing post-graduate work, they usually have their fates decided by their school.

I then took that text and ran it through three separate automated paraphrasing services. Here is what the first one came up with:

At the point when understudies are discovered copying, regardless of whether they are in secondary school, school or doing post-graduate work, they normally have their destinies chose by their school.

And the second one:

When scholars would found plagiarizing, if they are to helter skelter school, school or finishing post-graduate work, they Typically need their fathead concluded by their one school.

And the third:

At the point when understudies are discovered copying, regardless of whether they are in secondary school, school or doing post-graduate work, they as a rule have their destinies chose by their school.

Though I don’t believe my paragraph to be the greatest ever written, I also believe that it holds up well against the generated ones, which are confusing, use odd word choices and are generally more muddled.

This is a theme that continued when I tested larger blocks of text. The results ranged from “poorly written” to “word salad”.

If students are widely using such tools to avoid plagiarism detection tools, they are likely to be disappointed in the results. While their content may not be trivially detected by plagiarism software (though the study showed that it still can be with a close analysis), their writing will not be of high quality.

However, the study wasn’t primarily concerned that students were using these tools to successfully cheat, it was concerned with that idea that students might lean on this technology and struggle because of it.

The Immediate Problem

The real problem with this tech, as was pointed out in the study, is not that it enables students to cheat with immunity, it’s that students may attempt to rely upon it and either pick up bad habits.

With many students sadly unclear on the concept of paraphrasing, an internet-based tool promising “automated paraphrasing” may be a temptation they can not avoid. However, what these tools provide is nothing like paraphrasing, which is best achieved by writing in a cleanroom environment.

In short, students who turn to these tools for help with a writing skill may be disappointed with the outcome in nearly every way.

That’s because such tools are both poor at avoiding plagiarism detection, with the original study showing that they show evidence of “patchwork” plagiarism, and they make even worse writing.

In short, these tools promise something they can’t deliver on, help with paraphrasing. But in making that promise, they may steer students away from resources that could actually help them, such as tutors and instructors that might be able to give them the ability to paraphrase correctly.

It’s already challenging enough to get students to seek out the help that they need, these tools simply provide an extra obstacle that educators must overcome to help students truly grasp paraphrasing and proper citation.

Bottom Line

Are automated paraphrasing tools a serious threat? Not in their current form.

The tools as they exist now do a poor job of paraphrasing and fail to either completely fool plagiarism detection tools or to produce high quality writing. It’s unlikely any student will successfully employ such tools to cheat their way to a better grade.

The reason is that the technology, as it exists right now, is too simple. It simply isn’t capable of understanding the nuances of language and producing adequate writing.

However, the day will likely come that technology can be effectively used as a substitute for human writers. There have already been experiments with automated writing in journalism and the technology will only get better.

That being said, in the past decade, automated paraphrasing tools have not improved drastically and the only significant change has been the marketing.

However, that change in marketing should definitely put it on the radar of instructors but not as something that can trivially defeat plagiarism detection tools, but as something that can prevent students from getting the help and instruction they need.

As such, I agree with the paper’s authors that this needs to be something educators are aware of and discuss with their students. Explaining what these tools actually are and why they don’t work may help many students avoid getting sucked in by them.

Because, while such tools may not be the next escalation in the war on plagiarism, they are an escalation in fight to get students the help they need. That’s because, while the quick fix they promise won’t work, it’s a temptation many students will find hard to avoid.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free