There is a brand new AI model family, and it’s one in every of the few that might be reproduced from scratch.
On Tuesday, Ai2, the non-profit AI research organization founded by the late Paul Allen, released OLMo 2, the second family of models in its OLMo series. (OLMo is brief for “Open Language Model.”) While there isn’t any shortage of “open” language models to pick from (see: Metas Lama), OLMo 2 meets the Open Source Initiative's definition of open source AI, i.e. the The tools and data used to develop are publicly available.
The Open Source Initiative, the long-standing institution Aiming to define and “manage” the whole lot related to open source, the corporate finalized its open source AI definition in October. But the primary OLMo models that got here onto the market in February also met the criterion.
“OLMo 2 (was) built from start to complete with open and accessible training data, open source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints and more,” AI2 wrote in a single Blog post. “By openly sharing our data, recipes and insights, we hope to offer the open source community with the resources they should discover recent and revolutionary approaches.”
There are two models within the OLMo 2 family: one with 7 billion parameters (OLMo 7B) and one with 13 billion parameters (OLMo 13B). Parameters roughly correspond to a model's problem-solving capabilities, and models with more parameters generally perform higher than those with fewer parameters.
Like most language models, OLMo 2 7B and 13B can perform a spread of text-based tasks, corresponding to answering questions, summarizing documents, and writing code.
To train the models, Ai2 used a dataset of 5 trillion tokens. Tokens represent bits of raw data; 1 million tokens equals roughly 750,000 words. The training set included web sites “filtered for prime quality,” academic papers, question-and-answer discussion forums, and math workbooks, “each synthetic and human-generated.”
Ai2 claims the result’s performance-competitive models with open models like Meta's Llama 3.1 version.
“Not only can we observe a dramatic increase in performance across all tasks in comparison with our previous OLMo model, but more importantly, OLMo 2 7B outperforms LLama 3.1 8B,” writes Ai2. “OLMo 2 (represents) the most effective fully open language models so far.”
The OLMo 2 models and all their components might be downloaded from Ai2 website. They are under the Apache 2.0 license and may subsequently be used commercially.
There has been some debate recently concerning the security of open models, with Llama models reportedly getting used by Chinese researchers to develop defense tools. When I asked Ai2 engineer Dirk Groeneveld in February if he was anxious about misuse of OLMo, he told me that he thought the advantages ultimately outweighed the negatives.
“Yes, it is feasible for open models for use inappropriately or for unintended purposes,” he said. “However, (this) approach also encourages technical advances that result in more ethical models; is a prerequisite for verification and reproducibility, as these can only be achieved with access to your complete stack; and reduces a growing concentration of power, thereby creating more equitable access.”