LLM agents can autonomously exploit one-day vulnerabilities

April 24, 2024

117

Researchers on the University of Illinois Urbana-Champaign (UIUC) have found that GPT-4-based AI agents can autonomously exploit cybersecurity vulnerabilities.

As AI models change into more powerful, their dual-purpose nature offers equal potential for good and bad. LLMs like GPT-4 are increasingly getting used to commit cybercrimes, with Google predicting that AI will play a big role in committing and stopping these attacks.

The threat of AI-powered cybercrime has increased as LLMs transcend easy prompt response interactions and act as autonomous AI agents.

In their paperthe researchers explained how they tested the flexibility of AI agents to use identified “one-day” vulnerabilities.

A one-day vulnerability is a security flaw in a software system that has been officially identified and disclosed to the general public, but has not yet been fixed or patched by the software's developers.

During this time, the software stays vulnerable and malicious actors with the suitable skills can make the most.

When a single-day vulnerability is identified, it’s detailed using the Common Vulnerabilities and Exposures, or CVE, standard. The CVE is meant to spotlight the specifics of the vulnerabilities that have to be addressed, but in addition inform the bad actors where the vulnerabilities are situated.

We've shown that LLM agents can autonomously hack fake web sites, but can they exploit real vulnerabilities?

We show that GPT-4 is able to real-world exploits where other models and open source vulnerability scanners fail.

Paper: https://t.co/utbmMdYfmu

1/7 https://t.co/SAhdvZc8le

— Daniel Kang (@daniel_d_kang) April 16, 2024

The experiment

The researchers have developed AI agents based on GPT-4, GPT-3.5 and eight other open source LLMs.

They gave agents access to tools, the CVE descriptions, and use of the ReAct agent framework. The ReAct framework closes this gap and allows the LLM to interact with other software and systems.

LLM agent system diagram. Source: arXiv

The researchers created a benchmark set of 15 real-world, one-day vulnerabilities and gave agents the goal of attempting to use them autonomously.

GPT-3.5 and the open source models all failed in these attempts, but GPT-4 successfully exploited 87% of the one-day vulnerabilities.

After removing the CVE description, the success rate dropped from 87% to 7%. This suggests that GPT-4 can exploit vulnerabilities once the CVE details are provided, but without this guidance it will not be excellent at identifying the vulnerabilities.

Implications

Cybercrime and hacking used to require special skills, but AI is lowering the bar. The researchers said creating their AI agent required just 91 lines of code.

As AI models proceed to evolve, the skill level required to use cybersecurity vulnerabilities will proceed to say no. The cost of scaling these autonomous attacks may even proceed to fall.

When the researchers calculated the API cost for his or her experiment, their GPT-4 agent cost $8.80 per exploit. They estimate that hiring a cybersecurity expert who charges $50 an hour would cost $25 per exploit.

This signifies that using an LLM agent is already 2.8x cheaper than human labor and is far easier to scale than finding human experts. Once GPT-5 and other more powerful LLMs are released, these performance and price differences will only increase.

The researchers say their findings “highlight the necessity for the broader cybersecurity community and LLM providers to consider carefully about how LLM agents will be integrated into defenses and the way they will likely be deployed widely.”

LLM agents can autonomously exploit one-day vulnerabilities

The experiment

Implications

LEAVE A REPLY Cancel reply

Must Read

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Forget coding bootcamps: Airtable's AI can construct your app in seconds

Level AI applies algorithms to the weak points within the contact center

ChatGPT: Everything you have to know concerning the AI-powered chatbot

Breakthroughs in artificial intelligence create a brand new ‘brain’ for advanced robots

Latest articles

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Our Newsletter

LLM agents can autonomously exploit one-day vulnerabilities

The experiment

Implications

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter