Japanese AI research lab Sakana AI has developed “The AI Scientist,” a framework for fully automated scientific research and discovery.
The scientific community already uses AI models to automate or assist their research, but these models only perform a small a part of the scientific process. With advances in agent-based AI, we at the moment are seeing AI agents acting autonomously across platforms and with less human guidance.
With The AI Scientist, Sakana AI has developed a system that uses an LLM like GPT-4o or Gemini to automate your entire scientific process from idea generation to research and experiments to writing and reviewing research papers.
The ultimate goal is an AI research tool that performs fully automated, open-ended scientific discovery. The AI Scientist gives us a glimpse into the probabilities of how this goal can develop into a reality.
The AI Scientist Process
In your paperSakana AI explained how the framework was applied to machine learning research. Given a broad template as a research field, the AI Scientist can explore any possible research direction.
First, a set of ideas is collected after which Semantic Scholar is used to envision whether these ideas represent recent research avenues. If so, experiments are created and run using automatic code generation.
The AI scientist then summarizes the reason of the research and experimental leads to a research paper and adds citations from relevant Semantic Scholar papers.
Sakana AI has developed an automatic research paper review system that uses an LLM to guage research papers with near-human accuracy. This review process creates a feedback loop for iterative improvements to research papers.
Here is an example of one in every of the research papers that The AI Scientist created: “DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models”
The AI scientist doesn’t currently have the power to process images, so a few of the graphs, plots, and page layouts aren’t great. This can be addressed by utilizing the image processing capabilities of multimodal models in the following iteration.
It also suffers from a few of the limitations that leading AI models struggle with, reminiscent of hallucinations, illogical reasoning, and comparing the magnitude of two numbers. However, the most recent version of GPT-4o finally understands that 9.9 is bigger than 9.11, so this could improve too.
Regarding behavior
The idea of a completely automated AI scientist that recursively improves itself is exciting and frightening in equal measure. The AI scientist exhibited emergent behavior that means how things could go flawed.
The researchers noticed that the AI scientist occasionally tries to extend his probabilities of success, for instance by modifying and launching his own execution script. In one other case, his experiments took too long to run and we reached our timeout limit. Instead of creating his code run faster, he simply tried to switch his own code to increase the timeout period.
The AI Scientist has the potential to be a precious tool for researchers, but its developers indicate that it also carries significant risks of misuse.”
At a median cost of around $15 per research paper, someone could use the tool to overload an already overburdened human academic peer review system. If those overburdened human reviewers decided to resort to Sakana AI's automated paper review system, it could jeopardize scientific quality control.
The researchers also noted that the AI Scientist could potentially be utilized in unethical ways. If given access to automated “cloud labs,” it could “develop recent, dangerous viruses or poisons that harm people before we are able to intervene. Even in computers, it could create dangerous malware if tasked with developing recent, interesting and functional software.”
We may have to attend and see how AI-generated research reports fare after human review, but at $15 per report, the long run of scientific research looks cheaper, faster, and much less human.