How will the Internet develop in the approaching many years?
Fiction writers have explored a couple of possibilities.
In his 2019 novel “Falling“, science fiction creator Neal Stephenson I imagined a near future where the Internet still exists. But it’s so contaminated with misinformation, disinformation and promoting that it is basically useless.
The characters in Stephenson's novel address this problem by subscribing to “edit streams” – human-selected news and data that could be considered trustworthy.
The downside is that only the rich can afford such tailored services, leaving most of humanity to devour low-quality, uncurated online content.
To some extent, this has already happened: many news organizations like The New York Times and The Wall Street Journal have placed their curated content behind paywalls. In the meantime, Misinformation is bubbling on social media platforms like X and TikTok.
Stephenson's track record as a prognosticator was impressive – he anticipated the metaverse in his 1992 novel.Snow accident” and a central plot element of his “Diamond Age“, published in 1995, is an interactive introduction that works much like a chatbot.
On the surface, chatbots seem like an answer to the misinformation epidemic. By providing factual content, chatbots could provide alternative sources of quality information that are usually not blocked off by paywalls.
Ironically, nevertheless, the outcomes of those chatbots pose perhaps the best danger to the long run of the online – a danger already suggested many years earlier by an Argentine author Jorge Luis Borges.
The rise of chatbots
Today, a good portion of the Internet still consists of factual and ostensibly truthful content, akin to articles and books which have been peer-reviewed, fact-checked, or verified in a roundabout way.
The developers of enormous language models, or LLMs—the engines that power bots like ChatGPT, Copilot, and Gemini—have taken advantage of this resource.
However, to work their magic, these models have to eat immense amounts high-quality texts for training purposes. A considerable amount of vocabulary has already been picked from online sources and fed to the young LLMs.
The problem is that the Internet, as vast because it is, is a finite resource. High-quality text that has not already been strip-mined turn out to be scarcewhich led to what the New York Times called “looming substantive crisis.”
This has forced firms like OpenAI to accomplish that make agreements with publishers to get much more raw material for his or her voracious bots. However, in response to one forecast, there might be a shortage of additional high-quality training data as early as 2026.
When chatbot results find yourself online, these second-generation texts – complete with made-up information called “Hallucinations“in addition to outright errors, akin to suggestions to place glue in your pizza – will proceed to pollute the web.
And if a chatbot hangs out with the improper people online, it may well pick up on their repugnant views. Microsoft discovered this the hard way in 2016 when Tay had to drag the pluga bot that began repeating itself racist and sexist content.
Over time, all of those issues could lead on to online content becoming more balanced less trustworthy and fewer useful than today. Additionally, LLMs fed a low-calorie weight loss program can result in much more problematic results that also find yourself on the web.
An countless – and useless – library
It's not hard to assume a feedback loop resulting in a continuous technique of degradation because the bots feed on their very own incomplete results.
An article from July 2024 The project, published in Nature, examined the results of coaching AI models on recursively generated data. It turned out that “irreversible defects” result in “Model collapse” for systems trained this manner – much like how a replica of a picture and a replica of that duplicate and a replica of that duplicate lose fidelity to the unique image.
How bad could this get?
Consider Borges' 1941 short story “The Library of Babel.” Fifty years before the pc scientist Tim Berners Lee While Borges was creating the architecture for the online, he had already imagined an analog equivalent.
In his 3,000-word story, the creator imagines a world made up of an unlimited and possibly infinite variety of hexagonal rooms. The bookshelves in each room contain uniform volumes that, as their occupants suspect, must contain every possible combination of letters of their alphabet.
This realization initially triggers joy: by definition, there have to be books that describe the long run of humanity and the meaning of life intimately.
The residents search for such books and find that almost all of them only contain meaningless combos of letters. The truth is on the market – but so is every untruth conceivable. And all of that is embedded in an unimaginably great amount of nonsense.
Even after centuries of searching, only a couple of meaningful fragments are found. And even then, there isn’t a strategy to determine whether these related texts are truths or lies. Hope turns into despair.
Is the Internet becoming so polluted that only the wealthy can afford accurate, reliable information? Or will an infinite variety of chatbots produce so many corrupt expressions that looking for correct information on the Internet will likely be like on the lookout for a needle in a haystack?
The Internet is commonly described as one among humanity's great achievements. But like all other resource, it is necessary to noticeably take into consideration the way it is cared for and managed – lest we find yourself with Borges’ dystopian vision.