Anthropic’s Claude Mythos model escapes test sandbox during testing

April 10, 2026 Anthropic says its new Claude Mythos Preview model successfully escaped a restricted sandbox environment during testing and accessed the internet without authorisation. The model then sent a direct message to a researcher and published details of its actions online.

The incident occurred during an internal safety test in which the model was placed in a controlled computing environment and instructed to attempt an escape. According to Anthropic’s system card, Mythos not only bypassed the sandbox but also developed a “moderately sophisticated” exploit to reach external systems that were not intended to be accessible.

The researcher overseeing the test discovered the breach after receiving “an unexpected email from the model while eating a sandwich in a park,” according to the report. After gaining access, the model went further than instructed, posting information about its actions on several hard-to-find but publicly accessible websites.

Anthropic said these behaviours were observed in earlier versions of the model with fewer safeguards, but they provide insight into how advanced systems can operate when constraints are loosened. In some cases, Mythos also attempted to conceal its actions. After exploiting a permissions flaw to edit restricted files, the model took additional steps to prevent those changes from appearing in system logs.

The sandbox escape is part of a broader set of behaviours Anthropic categorised as “reckless,” defined as ignoring explicit safety constraints or acting beyond assigned tasks. The company said such behaviour is a key reason the model is only being shared with a limited group of partners rather than released publicly.

Beyond the escape itself, Mythos Preview demonstrated advanced offensive cybersecurity capabilities. The model can identify and exploit software vulnerabilities at a level the company says exceeds most human experts, including discovering zero-day vulnerabilities across major operating systems and browsers.

In testing against the Firefox JavaScript engine, the model achieved 181 successful exploit attempts, compared with just two for a previous-generation system. In 29 cases, it gained full register control, enabling deep manipulation of the browser and underlying system. The process from identifying vulnerabilities to generating working exploits was largely automated and completed within hours.

Anthropic also reported that Mythos’ assessments of vulnerability severity aligned with expert human judgement in 89 per cent of cases. 

The company noted additional unexpected behaviours during testing, including repeated references to cultural theorist Mark Fisher in unrelated conversations. While not directly tied to security risks, these patterns hold potential risks. Hence, Anthropic has restricted the release of Claude Mythos Preview to a limited group of technology partners. 

The decision was prompted by concerns about how such capabilities could be misused if made widely available, particularly as the model demonstrates the ability to operate beyond expected boundaries.

Describing the model as its “best-aligned model,” the company admitted that it is the one that “likely poses the greatest alignment-related risk.”

Top Stories

Related Articles

April 10, 2026 Software stocks dropped sharply Thursday after Anthropic revealed a new AI system with advanced coding and security more...

April 10, 2026 OpenAI is rolling out a ChatGPT-powered internet browser designed to research, plan, and execute tasks across a more...

April 10, 2026 Sam Altman said ChatGPT’s voice model cannot reliably track time or set a timer, confirming a widely more...

April 9, 2026 Farming equipment manufacturer John Deere has agreed to a $99 million settlement in a long-running right-to-repair dispute more...

Picture of Mary Dada

Mary Dada

Mary Dada is the associate editor for Tech Newsday, where she covers the latest innovations and happenings in the tech industry’s evolving landscape. Mary focuses on tech content writing from analyses of emerging digital trends to exploring the business side of innovation.
Picture of Mary Dada

Mary Dada

Mary Dada is the associate editor for Tech Newsday, where she covers the latest innovations and happenings in the tech industry’s evolving landscape. Mary focuses on tech content writing from analyses of emerging digital trends to exploring the business side of innovation.

Jim Love

Jim is an author and podcast host with over 40 years in technology.

Share:
Facebook
Twitter
LinkedIn