OpenScholar: The open source AI that outperforms GPT-4o in scientific research

November 21, 2024

257

Scientists are drowning in data. With tens of millions of research papers published annually, even essentially the most dedicated experts find it difficult to not sleep thus far with the most recent findings of their field.

A brand new artificial intelligence system called OpenScholarguarantees to rewrite the foundations for the way researchers access, evaluate, and synthesize scientific literature. Built by the Allen Institute for AI (Ai2) and the University of WashingtonOpenScholar combines state-of-the-art retrieval systems with a fine-tuned language model to deliver citation-based, comprehensive answers to complex research questions.

“Scientific progress depends upon researchers’ ability to synthesize the growing body of literature,” the OpenScholar researchers wrote their paper. However, this ability is increasingly limited by the sheer volume of data. OpenScholar, they argue, offers a path forward – one which not only helps researchers address the flood of publications but in addition challenges the dominance of proprietary AI systems like OpenAI GPT-4o.

How OpenScholar's AI brain processes 45 million research papers in seconds

At the guts of OpenScholar is a retrieval-augmented language model that extends to a knowledge store of greater than 45 million open access scientific papers. When a researcher asks an issue, OpenScholar doesn't just generate a solution from pre-trained knowledge, as is usually the case with models like GPT-4o. Instead, it actively retrieves relevant articles, summarizes their findings, and generates a response based on those sources.

This ability to remain grounded in real-world literature is a key differentiator. In tests with a brand new benchmark called ScholarQABenchOpenScholar was developed specifically for the evaluation of AI systems in relation to open scientific questions and stood out in consequence. The system demonstrated superior performance when it comes to factuality and citation accuracy, even outperforming much larger proprietary models corresponding to GPT-4o.

One particularly damning finding concerned GPT-4o's tendency to generate made-up quotes – hallucinations, as AI says. When answering biomedical research questions, GPT-4o cited non-existing papers greater than 90% of the time. OpenScholar, alternatively, remained firmly anchored in verifiable sources.

Anchoring it in real, retrieved papers is prime. The system uses what the researchers call their “Self-feedback inference loop“and “refines its results iteratively through natural language feedback, which improves quality and adaptively incorporates complementary information.”

The implications for researchers, policymakers and business leaders are significant. OpenScholar could develop into an important tool for accelerating scientific discovery, enabling experts to synthesize knowledge more quickly and safely.

This is how OpenScholar works: The system first searches 45 million research papers (left), uses AI to retrieve and classify relevant passages, generates an initial answer, after which refines it through an iterative feedback loop before reviewing the citations. This process enables OpenScholar to offer accurate, citation-supported answers to complex scientific questions. | Source: Allen Institute for AI and University of Washington

In the battle of David against Goliath: Can open source AI compete with Big Tech?

OpenScholar's debut comes at a time when the AI ecosystem is increasingly dominated by closed, proprietary systems. Models like OpenAI GPT-4o and anthropics Claude offer impressive possibilities, but are expensive, opaque and inaccessible to many researchers. OpenScholar turns this model on its head by being completely open source.

The OpenScholar team didn't just publish the code for the language model, but in addition for the entire thing Retrieval pipelinea specialized one 8 billion parameter model are tailored to scientific tasks and a Data storage of scientific papers. “To our knowledge, that is the primary open release of a whole pipeline for a research assistant LM – from data to training recipes to model checkpoints,” the researchers wrote of their Blog post Announcement of the system.

This openness isn’t only a philosophical stance; it is usually a practical advantage. OpenScholar's smaller size and optimized architecture make it far cheaper than proprietary systems. Researchers, for instance, appreciate this OpenScholar-8B is 100 times cheaper to run than PaperQA2a concurrent system based on GPT-4o.

This cost-effectiveness could democratize access to powerful AI tools for smaller institutions, underfunded labs, and researchers in developing countries.

However, OpenScholar isn’t without limitations. Its data repository is proscribed to open access articles and ignores paid research, which is prevalent in some areas. Although this restriction is legally needed, it implies that the system may miss vital findings in areas corresponding to medicine or technology. The researchers acknowledge this gap and hope that future iterations can responsibly integrate closed-access content.

How OpenScholar performs: Expert reviews show that OpenScholar (OS-GPT4o and OS-8B) competes with each human experts and GPT-4o on 4 key metrics: organization, coverage, relevance, and usefulness. Notably, each OpenScholar versions were rated as “more useful” than human-written answers. | Source: Allen Institute for AI and University of Washington

The latest scientific method: When AI becomes your research partner

The OpenScholar project raises vital questions on the role of AI in science. The system's ability to synthesize literature, while impressive, isn’t infallible. Expert reviews preferred OpenScholar responses over human-written responses 70% of the time, however the remaining 30% highlighted areas where the model fell short – corresponding to failing to cite seminal papers or choosing less representative studies.

These limitations underscore a bigger truth: AI tools like OpenScholar are intended to enhance human expertise, not replace it. The system is designed to support researchers in tackling the time-consuming task of literature synthesis and permit them to give attention to interpreting and advancing knowledge.

Critics might indicate that OpenScholar's reliance on open access articles limits its immediate usefulness in vital areas corresponding to pharmaceuticals, where much research is locked behind paywalls. Others argue that while the system's performance is robust, it still depends heavily on the standard of the information retrieved. If the fetch step fails, the complete pipeline risks producing suboptimal results.

But despite its limitations, OpenScholar represents a turning point in scientific computing. While previous AI models impressed with their ability to have interaction in conversations, OpenScholar demonstrates something more fundamental: the flexibility to process, understand, and interpret scientific literature with near-human accuracy synthesize.

The numbers tell a compelling story. OpenScholar's 8 billion parameter model outperforms GPT-4o but is orders of magnitude smaller. It is comparable to human experts when it comes to citation accuracy, where other AIs fail 90% of the time. And perhaps most tellingly, experts prefer the answers to the answers of their colleagues.

These achievements suggest that we’re entering a brand new era of AI-powered research, where the bottleneck in scientific progress may not be our ability to process existing knowledge, but fairly our ability to ask the suitable questions .

The researchers published all the pieces– code, models, data and tools – and are betting that openness will speed up progress faster than hiding their breakthroughs behind closed doors.

In doing so, they’ve answered probably the most pressing questions in AI development: Can open source solutions compete with Big Tech's black boxes?

The answer appears to be obviously hidden in 45 million newspapers.

OpenScholar: The open source AI that outperforms GPT-4o in scientific research

How OpenScholar's AI brain processes 45 million research papers in seconds

In the battle of David against Goliath: Can open source AI compete with Big Tech?

The latest scientific method: When AI becomes your research partner

LEAVE A REPLY Cancel reply

Must Read

Hands-on with Bee, Amazon's newest AI wearable

New Zealand's low productivity is commonly attributed to the undeniable fact that corporations remain small. That might be a strength in 2026

I used AI chatbots as a news source for a month they usually were unreliable and buggy

As a part of the “physical AI” takeover of CES 2026

Humanoid robots or human connection? What Elon Musk's Optimus reveals about our AI ambitions

3 questions: How AI could optimize the ability grid

Decoding the Arctic to predict winter weather

Latest articles

Hands-on with Bee, Amazon's newest AI wearable

New Zealand's low productivity is commonly attributed to the undeniable fact that corporations remain small. That might be a strength in 2026

I used AI chatbots as a news source for a month they usually were unreliable and buggy

Our Newsletter

OpenScholar: The open source AI that outperforms GPT-4o in scientific research

How OpenScholar's AI brain processes 45 million research papers in seconds

In the battle of David against Goliath: Can open source AI compete with Big Tech?

The latest scientific method: When AI becomes your research partner

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter