HomeNewsThe best guide to recognizing AI writing comes from Wikipedia

The best guide to recognizing AI writing comes from Wikipedia

We've all felt the sneaking suspicion that something we're reading was written by a significant language model – but it surely's remarkably difficult to pinpoint. For a number of months last yr, it was believed that certain words like “immerse” or “underline” could give away models, however the evidence is thin, and as models have change into more sophisticated, it has change into harder to trace down the telltale words.

But because it seems, the parents at Wikipedia have gotten pretty good at labeling AI-written prose — and the group's public guide to it “Signs of AI writing” is the perfect source I've found to search out out in case your suspicions are justified. (Thanks to poet Jameson Fitzpatrick for declaring the document on X.)

Since 2023, Wikipedia editors have been working to get a handle on AI submissions, a project they call Project AI Cleanup. With hundreds of thousands of edits coming in every single day, there may be plenty of fabric to attract on, and in classic Wikipedia editor style, the group has produced a field guide that’s each detailed and evidentiary.

First of all, the guide confirms what we already know: automated tools are fundamentally useless. Instead, the guide focuses on habits and phrases which are rare on Wikipedia but common on the Internet (and subsequently common within the model's training data). According to the guide, AI submissions spend a number of time highlighting why a problem is vital, often basically terms like “a defining moment” or “a broader movement.” AI models may even spend a number of time detailing smaller media points to make the topic seem noteworthy – the form of thing you'd expect from a private biography, but not from an independent source.

The guide points out a very interesting quirk within the context of tailing clauses with unclear claims of importance. Models will say that an event or detail “emphasizes the importance” of something or other or “reflects the continuing relevance” of a general idea. (Grammar nerds know this because the “present participle.”) It's slightly difficult to pinpoint, but when you recognize it, you'll see it all over the place.

There can be an inclination towards vague marketing language that’s prevalent on the web. The landscapes are at all times picturesque, the views are at all times breathtaking and every little thing is clean and modern. As the editors put it: “It sounds more like a transcript of a TV business.”

The guide is price reading in full, but I used to be very impressed. Previously I’d have said that LLM prose was moving too quickly to pin down. However, the habits discussed listed here are deeply embedded in the way in which AI models are trained and deployed. They will be disguised, but it would be difficult to completely eliminate them. And if most people knows more about identifying AI prose, it could have all varieties of interesting consequences.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read