GPT-4 Autonomously Hacks Zero-Day Security Flaws with 53% Success Rate – Cornell Study

June 9, 2024 Researchers have successfully used GPT-4 to autonomously hack more than half of their test websites using zero-day exploits, marking a significant milestone in AI capabilities and cybersecurity risks.

A few months ago, a research team demonstrated GPT-4’s ability to autonomously exploit one-day vulnerabilities—security flaws that are known but have not yet been patched. Given the Common Vulnerabilities and Exposures (CVE) list, GPT-4 could exploit 87% of critical-severity CVEs on its own.

This week, the same researchers published a follow-up study showing that GPT-4 can now exploit zero-day vulnerabilities—previously unknown security flaws—with a 53% success rate. The team used a method called Hierarchical Planning with Task-Specific Agents (HPTSA), which involves a “planning agent” overseeing the process and deploying multiple “subagents” for specific tasks. This hierarchical approach mimics a project management system, where the planning agent acts like a boss, coordinating subagents to handle specific tasks.

When benchmarked against 15 real-world web-focused vulnerabilities, HPTSA proved 550% more efficient than a single LLM in exploiting vulnerabilities, successfully hacking 8 out of 15 zero-day vulnerabilities. In contrast, a solo LLM effort only managed to hack 3 out of the 15 vulnerabilities.

This development raises significant cybersecurity concerns, as the ability to autonomously exploit zero-day vulnerabilities could be used maliciously. Daniel Kang, one of the researchers, emphasized that while GPT-4 in chatbot mode cannot understand or exploit vulnerabilities, the capabilities demonstrated in this study highlight the potential risks.

In practical terms, the method involves the planning agent launching subagents to tackle different parts of the task, reducing the workload on any single agent. This technique mirrors how Cognition Labs uses its Devin AI for software development, planning out jobs and spawning specialist “employees” as needed.

Source: Cornell University

Top Stories

Meta CTO admits employee morale among worst in company’s history

June 26, 2026

Ontario Residents to gain free credit lock protection

June 23, 2026

How can project sponsors help projects succeed?

June 22, 2026

AI pioneer Yann LeCun calls Elon Musk’s xAI a ‘failure’

June 22, 2026

Microsoft sued by shareholders over Azure growth and AI spending

June 22, 2026

SpaceX shares fall 20% from peak following $60 billion cursor acquisition

June 19, 2026

Today's News

Polaroid takes aim at AI and data centres in new advertising push

June 26, 2026 Polaroid has launched a new advertising campaign criticizing data centre water consumption as concerns about the environmental more...

AI, Today's News

Utah Senate President loses primary amid data centre backlash

June 26, 2026 Opposition to large-scale data centre developments tied to the artificial intelligence boom is beginning to influence U.S. more...

AI, Companies, Sustainability

Ford turns to veteran engineers after AI fell short on quality, report says

June 26, 2026 Ford Motor Co. turned to veteran engineers to tackle persistent vehicle quality problems after finding that artificial more...

Companies, Today's News, Top Stories

Meta CTO admits employee morale among worst in company’s history

June 26, 2026 Meta's chief technology officer says employee morale has fallen to one of the lowest levels in the more...

Jim Love

Jim Love's career in technology spans more that four decades. He's been a CIO and headed a world wide Management Consulting practice. As an entrepreneur he built his own tech business. Today he is a podcast host with the popular tech podcasts Hashtag Trending and Cybersecurity Today with over 14 million downloads. As a novelist, his latest book "Elisa: A Tale of Quantum Kisses" is an Audible best seller. In addition, Jim is a songwriter and recording artist with a Juno nomination and a gold album to his credit. His music can be found at music.jimlove.com

GPT-4 Autonomously Hacks Zero-Day Security Flaws with 53% Success Rate – Cornell Study

Top Stories

Related Articles

Jim Love

Jim Love