Anthropic Publishes Claude 'Mythos' Safety Report After CEO Shelves Cybersecurity Model

Anthropic releases a detailed technical brief on why its Claude Mythos model was deemed too dangerous to deploy, marking a rare public disclosure on AI capability suppression.

SAN FRANCISCO — Anthropic is expected to publish a formal safety and capability evaluation report on Thursday for its Claude Mythos model, following CEO Dario Amodei's decision — first reported Wednesday — to withhold the system from release due to its advanced cybersecurity exploitation capabilities. The disclosure would represent one of the most detailed public accounts of a deliberate decision to suppress a frontier AI model on safety grounds.

The report, according to sources familiar with Anthropic's internal review process, details how Claude Mythos demonstrated the ability to generate novel attack vectors, identify zero-day-class vulnerabilities in test environments, and produce functional exploit code at a level the company's safety team classified as posing unacceptable dual-use risk. Amodei has framed the decision as a proof of concept for Anthropic's 'responsible scaling policy,' which commits the company to halting deployment when models cross predefined capability thresholds.

The move comes at a sensitive moment for the AI safety debate. Competitors including OpenAI and Google DeepMind have faced criticism for accelerating releases under commercial pressure, and Anthropic's public disclosure is widely expected to be interpreted as a pointed contrast — reinforcing the San Francisco company's identity as a safety-first lab. Industry observers note the timing is also strategic ahead of anticipated Congressional hearings on AI and critical infrastructure security scheduled for May.

Cybersecurity researchers and policy advocates have already begun weighing in on the preliminary reports circulating since Wednesday. Some experts at organisations such as the Atlantic Council's Cyber Statecraft Initiative have cautiously welcomed the transparency, while others have raised concerns that even publishing capability benchmarks from a suppressed model could function as a roadmap for adversaries seeking to replicate the results in less constrained environments.

Antropic's decision is likely to intensify calls within the EU and among US National Institute of Standards and Technology officials for mandatory pre-deployment capability disclosures for frontier models. The company has signalled it will brief select government partners, including the UK AI Safety Institute and the US AI Safety Institute, before the public report goes live on Thursday morning.

Next Edition

Anthropic Publishes Claude 'Mythos' Safety Report After CEO Shelves Cybersecurity Model

Accuracy Score

More from this edition