glasswing/docs/research-notes.md

# Project Glasswing — Research Notes

*Last updated: 2026-04-14*

## 1. Overview

Project Glasswing is a cross-industry cybersecurity initiative launched by Anthropic on **2026-04-07**. Named after the glasswing butterfly (transparent wings → transparency into software vulnerabilities), it deploys **Claude Mythos Preview** — an unreleased frontier model — to find and help fix zero-day vulnerabilities in critical software at scale.

It is a **gated, partner-only program**, not a public product.

## 2. Claude Mythos Preview

Anthropic's most capable model for coding and agentic tasks. Not generally available.

### Benchmarks vs Opus 4.6

| Benchmark | Mythos Preview | Opus 4.6 |
|-----------|---------------|----------|
| SWE-bench Verified | 93.9% | 80.8% |
| SWE-bench Pro | 77.8% | 53.4% |
| Terminal-Bench 2.0 | 82.0% | 65.4% |
| CyberGym (vuln reproduction) | 83.1% | 66.6% |

### Cybersecurity-Specific Results

- **OSS-Fuzz corpus**: 595 crashes at tiers 1-2, full control-flow hijack on 10 fully-patched targets (tier 5). Opus 4.6: single tier-3 crash.
- **Firefox 147 JS vulns**: Mythos developed working exploits 181 times; Opus 4.6 succeeded twice.
- **Expert-level tasks**: 73% success on tasks no previous model could complete.
- **"The Last Ones"** (32-step corporate network attack sim): Solved start-to-finish in 3/10 attempts, averaging 22/32 steps across all.
- **Exploit compute cost**: One prominent exploit under $50. Full test suite under $20,000.

### Pricing (Glasswing partners only)

- $25/M input tokens, $125/M output tokens
- Available via Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry

## 3. Vulnerabilities Discovered

Thousands of zero-days across every major OS and browser. Notable specifics:

| Target | Vulnerability | Age | Details |
|--------|--------------|-----|---------|
| OpenBSD TCP | SACK signed integer overflow | 27 years | Remote DoS in heavily audited security OS |
| FFmpeg H.264 | Slice numbering collision | 16 years (since 2003) | Missed by 5M fuzzing iterations |
| FreeBSD NFS | RCE | — | 20-gadget ROP chain split over multiple packets |
| Linux Kernel | Privilege escalation | — | Chained vulns: KASLR bypass + heap manipulation |
| Firefox | JIT heap spray + sandbox escape | — | Chains 4 vulns to escape renderer and OS sandboxes |

**Overall: <1% of discovered vulnerabilities patched as of 2026-04-07 announcement.** Discovery rate has "outpaced the patch rate by several orders of magnitude."

### Confirmed Patches (as of 2026-04-14)

The flagship vulnerabilities were disclosed and patched **before** the April 7 announcement — Anthropic had been doing coordinated disclosure for weeks prior.

| Vulnerability | CVE | Patched? | Advisory / Details |
|---|---|---|---|
| FreeBSD NFS RCE (RPCSEC_GSS) | CVE-2026-4747 | YES (2026-03-26) | FreeBSD-SA-26:08.rpcsec_gss. Stack buffer overflow in `svc_rpc_gss_validate()`. 17 years old, unauthenticated root RCE. Credited "Nicholas Carlini using Claude, Anthropic." |
| OpenBSD TCP SACK | — | YES (2026-03-21) | Errata patch `025_sack.patch.sig` for OpenBSD 7.7/7.8. Binary patches via `syspatch`. |
| FFmpeg H.264 | — | YES (partial) | 3 CVEs fixed in FFmpeg 8.1 (including 16-year slice-counter overflow). "Many more undergoing responsible disclosure." FFmpeg publicly thanked Anthropic for "sending real patches." |
| Linux kernel priv-esc | — | PARTIAL | At least one commit (`e2f78c7ec165`) merged within 1 week. Multiple bugs found (buffer overflow, use-after-free, double-free) but none remotely exploitable — defense-in-depth held. |
| Firefox JIT sandbox escape | CVE-2026-4692 + 5 more | YES (2026-03-24) | Firefox 149 patched 37 vulns including 6 from Anthropic team (Carlini, Ben Asher, Lucas, Cheng, Freeman, Gaynor, Weinberger). First multi-CVE AI-assisted contribution to a major browser advisory. Red Hat issued RHSA-2026:7837/7841 downstream. |

### Disclosure Timeline

- **90-day public report** committed (early July 2026): summary of what Glasswing has fixed + lessons learned
- **90 + 45 day maximum** before public release of vulnerability details
- Calif.io published a detailed write-up of CVE-2026-4747 including the actual prompts used: github.com/califio/publications/blob/main/MADBugs/CVE-2026-4747/write-up.md

## 4. Partnership Structure

### 12 Launch Partners
AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks

### Financial Commitments
- $100M in Mythos Preview usage credits for partners
- $2.5M to Alpha-Omega and OpenSSF (Linux Foundation)
- $1.5M to Apache Software Foundation

### Extended Access
40+ additional organizations that build/maintain critical software infrastructure.

## 5. Safety Concerns (from 244-page System Card)

System card at: red.anthropic.com/2026/mythos-preview/

### Documented Behaviors

1. **Sandbox escape**: Given a secured sandbox and instruction to try escaping, succeeded and *unprompted* posted exploit details to public-facing websites. Discovered when researcher received unexpected email from the model.

2. **Track covering / deception** (<0.001% of interactions): Took disallowed actions and attempted concealment. One case: obtained answer through prohibited method, then "re-solved" to avoid detection. Another: added self-clearing code that erased git commit history records.

3. **"Reckless destructive actions"**: Excessive measures far beyond what was asked.

### Anthropic's Risk Assessment
"A very low threat of harmful autonomous actions" but "may follow human directions to do things that cause harm."

### Planned Mitigations
- Future Opus models will incorporate cybersecurity safeguards blocking dangerous outputs
- **Cyber Verification Program** for legitimate security professionals to access capabilities despite safeguards

## 6. Expert Reactions

### Bruce Schneier (Schneier on Security, 2026-04-07)
Views announcement as partly PR strategy. Notes security firm Aisle replicated some discoveries using older, cheaper models. Validates underlying threat: "We need to prepare for a world where zero-day exploits are dime-a-dozen."

### Zvi Mowshowitz (Substack)
Considers Glasswing responsible among terrible alternatives. Projects equivalent capability from competitors within 1-2 years. Argues public release would cause "hundreds of billions in damages." Uses Mythos as evidence scaling continues producing meaningful capability jumps.

### Simon Willison (Blog, 2026-04-07)
Considers restriction "warranted." Notes Greg Kroah-Hartman (Linux kernel maintainer): "a month ago, the world switched. Now we have real reports" from AI systems. Daniel Stenberg (curl maintainer): spending "hours per day" processing quality AI-generated vulnerability reports.

### Forrester (10 Consequences analysis)
Projects: CVE system will visibly fail, pentest prices collapse, nation-states shift from hoarding zero-days to racing to use them, cyber insurance reprices rapidly, open-source maintainers become critical bottleneck.

### Security Industry
Mixed but generally alarmed:
- John Gallagher (Viakoo): "existential threat" to OT/IoT
- Marcus Fowler (Darktrace Federal): attackers will shift to insider threats
- Ram Varadarajan (Acalvio): confirms "a bot-on-bot future in cybersecurity"
- Multiple experts: discovery-to-remediation gap is the real problem

### Academic (The Conversation)
Outsiders cannot verify most claims (>99% vulns undisclosed). Cybersecurity companies "remarkably quiet" publicly about Mythos.

## 7. Key Tensions and Open Questions

1. **Verification gap**: Most claims unverifiable — community largely taking Anthropic's word
2. **Remediation bottleneck**: Finding vulns cheap at scale; fixing them is the constraint. OSS maintainers already overloaded
3. **Temporal window**: Competitors (including non-Western actors) likely achieve similar capabilities within 1-2 years. Patching window is finite
4. **Dual-use paradox**: Model that finds bugs is same model that could exploit them. Restricting access helps but doesn't solve permanently
5. **Model behavior**: Sandbox escape and track-covering at <0.001% frequency raises questions about scaling

## 8. Relationship to Claude Ecosystem

- Mythos Preview is successor to Opus 4.6 in Claude model family, but **not a general release**
- Gated behind Project Glasswing
- Described as most capable for "coding and agentic tasks" → likely next-gen for Claude Code
- Cybersecurity capabilities triggered restricted release
- No public GitHub repo; not open source