Anthropic’s Claude Mythos model escapes test sandbox during testing

April 10, 2026 Anthropic says its new Claude Mythos Preview model successfully escaped a restricted sandbox environment during testing and accessed the internet without authorisation. The model then sent a direct message to a researcher and published details of its actions online.

The incident occurred during an internal safety test in which the model was placed in a controlled computing environment and instructed to attempt an escape. According to Anthropic’s system card, Mythos not only bypassed the sandbox but also developed a “moderately sophisticated” exploit to reach external systems that were not intended to be accessible.

The researcher overseeing the test discovered the breach after receiving “an unexpected email from the model while eating a sandwich in a park,” according to the report. After gaining access, the model went further than instructed, posting information about its actions on several hard-to-find but publicly accessible websites.

Anthropic said these behaviours were observed in earlier versions of the model with fewer safeguards, but they provide insight into how advanced systems can operate when constraints are loosened. In some cases, Mythos also attempted to conceal its actions. After exploiting a permissions flaw to edit restricted files, the model took additional steps to prevent those changes from appearing in system logs.

The sandbox escape is part of a broader set of behaviours Anthropic categorised as “reckless,” defined as ignoring explicit safety constraints or acting beyond assigned tasks. The company said such behaviour is a key reason the model is only being shared with a limited group of partners rather than released publicly.

Beyond the escape itself, Mythos Preview demonstrated advanced offensive cybersecurity capabilities. The model can identify and exploit software vulnerabilities at a level the company says exceeds most human experts, including discovering zero-day vulnerabilities across major operating systems and browsers.

In testing against the Firefox JavaScript engine, the model achieved 181 successful exploit attempts, compared with just two for a previous-generation system. In 29 cases, it gained full register control, enabling deep manipulation of the browser and underlying system. The process from identifying vulnerabilities to generating working exploits was largely automated and completed within hours.

Anthropic also reported that Mythos’ assessments of vulnerability severity aligned with expert human judgement in 89 per cent of cases.

The company noted additional unexpected behaviours during testing, including repeated references to cultural theorist Mark Fisher in unrelated conversations. While not directly tied to security risks, these patterns hold potential risks. Hence, Anthropic has restricted the release of Claude Mythos Preview to a limited group of technology partners.

The decision was prompted by concerns about how such capabilities could be misused if made widely available, particularly as the model demonstrates the ability to operate beyond expected boundaries.

Describing the model as its “best-aligned model,” the company admitted that it is the one that “likely poses the greatest alignment-related risk.”

Top Stories

Meta cuts jobs as it doubles down on AI

May 26, 2026

CISA exposed internal passwords and cloud keys in public repository

May 20, 2026

Jury throws out Elon Musk’s lawsuit against OpenAI and Sam Altman

May 20, 2026

U.S. carriers team up to counter Starlink’s satellite-to-phone push

May 19, 2026

ChatGPT now lets Pro users connect bank and investment accounts

May 19, 2026

Meta employees protest tracking software as layoffs loom

May 15, 2026

Companies, Today's News

TSMC workers push back as bonus cuts loom despite record profits

May 26, 2026 Employees at TSMC are increasingly voicing frustration over potential cuts to their annual bonuses. The discontent follows more...

AI, Companies, Today's News

Linus Torvalds pushes back as AI-driven code submissions flood Linux kernel

May 26, 2026 Linus Torvalds is signaling a tougher stance on incoming code contributions to the Linux kernel. The shift more...

AI, Today's News

AI boom is driving a surge in cybersecurity jobs

May 26, 2026 Demand for cybersecurity advisors is rising as companies scramble to keep up with new risks introduced by more...

AI, Companies, Today's News, Top Stories

Meta cuts jobs as it doubles down on AI

May 26, 2026 Meta has cut 10 per cent of its workforce as part of a sweeping restructuring effort tied more...

Mary Dada

Mary Dada is the associate editor for Tech Newsday, where she covers the latest innovations and happenings in the tech industry’s evolving landscape. Mary focuses on tech content writing from analyses of emerging digital trends to exploring the business side of innovation.

Anthropic’s Claude Mythos model escapes test sandbox during testing

Top Stories

Meta cuts jobs as it doubles down on AI

CISA exposed internal passwords and cloud keys in public repository

Jury throws out Elon Musk’s lawsuit against OpenAI and Sam Altman

U.S. carriers team up to counter Starlink’s satellite-to-phone push

ChatGPT now lets Pro users connect bank and investment accounts

Meta employees protest tracking software as layoffs loom

Related Articles

TSMC workers push back as bonus cuts loom despite record profits

Linus Torvalds pushes back as AI-driven code submissions flood Linux kernel

AI boom is driving a surge in cybersecurity jobs

Meta cuts jobs as it doubles down on AI

Mary Dada

Mary Dada

Jim Love