HomeEthics & SocietyCopyleaks report that some 60% of GPT-3.5 outputs are plagiarized

Copyleaks report that some 60% of GPT-3.5 outputs are plagiarized

A study by Copyleaks found that a staggering 60% of the outputs from OpenAI’s GPT-3.5 exhibited signs of plagiarism.

Copyleaks, who develop plagiarism and AI content evaluation tools, highlight AI-generated text’s questionable originality and reliability, particularly in light of recent copyright infringement and plagiarism controversies. 

The study analyzed 1,045 outputs from GPT-3.5, spanning 26 academic and inventive subjects, including but not limited to physics, chemistry, computer science, psychology, law, and the humanities, with each output averaging 412 words in length.

The findings of the Copyleaks report include the next:

  • Approximately 59.7% of all GPT-3.5 generated texts were found to contain plagiarized content to a point.
  • 45.7% of outputs contained exact text matches, 27.4% included slight modifications, and 46.5% involved paraphrasing from pre-existing sources.
  • Notably, the topic of computer science saw the very best individual output “Similarity Score” at some 100%, highlighting a big concern in fields heavily reliant on technical and specialized language.

The study’s “Similarity Score” is a proprietary metric designed by Copyleaks to quantify the degree of originality in content. It amalgamates various aspects, reminiscent of equivalent text and paraphrasing.

Physics recorded the very best mean Similarity Score at 31.3%, with Psychology not far behind at 27.7% and General Science at 26.7%. On the other end of the spectrum, Theater had the bottom mean rating at just 0.9%, followed by Humanities at 2.8% and the English Language at 5.4%.

The spread of similarity scores across subjects isn’t particularly surprising. There are near-limitless ways to interpret a Shakespeare play and much fewer to investigate a well-established mathematical theorem, for instance.

Alon Yamin, CEO and Co-founder of Copyleaks, said subjects like physics, chemistry, computer science, and psychology warrant closer scrutiny for plagiarism because of their higher scores. 

“For example, Physics, Chemistry, Mathematics, and Psychology might require a more in-depth look to discover plagiarized text, while other subjects, including Theater and Humanities, may require less scrutiny,” said Yamin.

However, educators must acknowledge how some subjects naturally lend themselves to high similarity scores.

Yamin also stated, “Furthermore, the info underscores the necessity for organizations to adopt solutions that detect the presence of AI-generated content and supply the needed transparency surrounding potential plagiarism inside the AI content.”

That’s point. If educational organizations allow AI to draft and generate content (and a few already are), students could still be exposed to plagiarism.

It must even be said that scores for GPT-4-generated content would have shown lower plagiarism scores.

While the majority of AI-generated content might be still created with GPT-3.5 (since it’s free), GPT-4 is undoubtedly simpler at generating original work.

However, this introduces one other layer of complexity.

Since GPT-4 is a component of the paid version of ChatGPT, accepting or encouraging AI uses in education could discriminate against GPT-3.5 users unless subscriptions are subsidized.

A fragile balance

As generative AI tools develop into embedded in academic settings, each educators and students are confused about their use. 

Content evaluation corporations like Copyleaks and Turnitin have developed AI detection tools that predict when a string of words is probably going AI-generated. However, these have evident weaknesses and risk false positives. 

Further, AI detection software has been shown to heavily favor native English writing, because it often incorporates the next concentration of diverse vocabulary and idioms to sway AI detectors towards labeling text as ‘human-written.’ 

Curbing the usage of AI technology in academia won’t be easy. Generative AI is billed as the last word productivity tool, and lots of argue that for those who can use it, you must.

Students often argue that if these tools are pervasive in the actual world, they also needs to be allowed in educational settings. 

Plus, as many would attest, education is usually about finding inventive shortcuts to get things done.

Can you actually expect students to go away generative AI untouched on the table?


Please enter your comment!
Please enter your name here

Must Read