Unlock Editor's Digest totally free
FT editor Roula Khalaf selects her favourite stories on this weekly newsletter.
Open-source artificial intelligence has been probably the most surprising tech stories of the past yr. While firms like OpenAI and Google have poured billions of dollars into constructing ever more powerful AI, “open” models which might be freely available for developers to make use of and adapt have closed the performance gap.
There is simply one drawback: most of those open source systems aren’t very open. Critics accuse their supporters of “open washing” – they struggle to benefit from the halo effect of open source, which frees them from the restrictions of normal industrial software products without living as much as their name.
Efforts to create a very open source version of AI are finally gaining momentum. But there is no such thing as a guarantee that it should find a way to maintain up with the event of open source software, which has played a vital role within the technology world over the past 20 years. With traditional open source software, akin to the Linux operating system, the code is freely available for developers to view, use and adapt. So-called open source AI is a really different story, not least because most recent AI systems learn from data reasonably than having their logic programmed into code.
Take Meta's Llama, for instance. Only the “weights” that determine how the model responds to queries are exposed. Users can take it and customize it, but they will't see the underlying data it was trained on and don't have enough information to breed the model from scratch.
For many developers, nevertheless, this has some clear benefits: They can adapt and train quasi-open models using their very own information without having to pass on sensitive internal data to a different company.
But the dearth of openness comes at a price. According to Ayah Bdeir, a senior advisor on the Mozilla Foundation, only truly open source technology would give people a comprehensive understanding of the systems which might be progressively affecting every facet of our lives, while ensuring that innovation and competition can’t be stifled by a handful of dominant AI firms.
One answer got here from the Open Source Initiative, which defined open source software over 20 years ago. This week it published an almost final definition which could influence the event of the realm.
This would require not only the weights for a model to be released, but in addition enough information concerning the data it was trained on in order that another person could reproduce it, in addition to all of the code behind the system. Other groups akin to Mozilla and the Linux Foundation are pushing similar initiatives.
Such moves are already resulting in greater segmentation of the AI ​​world. Many firms are being more cautious with their terminology – perhaps because they’re aware that OSI owns the trademark on the term “open source” and will sue to forestall the term from getting used for AI models that don’t meet their very own definition. Mistral, for instance, calls its Nemo an “open weights” model.
In addition to partially open systems, fully open source models are increasingly appearing, akin to the Olmo Large Language model developed by the Allen Institute for AI. However, it is much from clear that this version can have as much of an impact within the AI ​​world because it has in traditional software. This would require two things.
First, the technology must meet a big enough have to attract a critical mass of users and developers. In traditional software, the Linux server operating system was a transparent alternative to Microsoft Windows, giving it a big user base and powerful support from Microsoft's competitors, including IBM and Oracle. In the AI ​​world, Linux has no equivalent. The market is already more fragmented, and lots of users will find quasi-open LLMs like Llama sufficient.
Supporters of open-source AI also need to raised justify its security. The prospect of such a robust, universally applicable technology being released to anyone justifiably raises widespread concerns.
Oren Etzioni, former director of the Allen Institute, says many fears are overblown. When it involves researching learn how to construct a bomb or a bioweapon online, he says, “You can't really get more out of those (AI models) than you’ll be able to get out of Google. There's loads of them on the market – it's just packaged in a different way.” He acknowledges there are some areas where making AI more freely available could cause harm, akin to automating the creation of more online disinformation.
“Closed” AI also carries risks. But until the extra risk of open source technology and the potential advantages are more thoroughly examined, the fears will remain.