Schuyler Stacy2026-06-11

Claude Fable 5: The Complete Guide and Honest Review (2026)

Claude Fable 5 is Anthropic's first public Mythos-class model. See what it does, what it costs, and when to pick it over Opus 4.8.

Discover AI Insights

Claude Fable 5: The Complete Guide and Honest Review (2026)

Anthropic spent months warning that its most capable models were getting too dangerous to release widely. Then, on June 9, 2026, it released one anyway — sort of. Claude Fable 5 is the first model from Anthropic's top “Mythos” tier that the public can actually call, and it sits a full rung above the Opus family. The catch is unusual: on a slice of sensitive questions, the version you're paying for will quietly hand your request to a different, weaker model and answer from there. That single design decision tells you most of what makes Fable 5 different, so it's where this guide starts.

This is a working developer's guide, not a launch-day recap. We've pulled the specs and behavior from Anthropic's own documentation, the independent benchmarks, and the first hands-on testing that's appeared since launch — and we've kept the numbers we couldn't verify out of it.

Table of contents

What Claude Fable 5 actually is

Before the benchmarks, the question worth answering is why a new tier exists at all. Anthropic's lineup already topped out at Opus 4.8. Fable 5 is the public, safety-gated release of a higher tier the company calls Mythos — the same underlying model as Claude Mythos 5, which stays restricted to a small set of vetted partners through a program called Project Glasswing. Fable is the version with the guardrails switched on; Mythos is the version with some of them removed.

In plain terms: Fable 5 is the strongest model Anthropic has ever made generally available, built less for chatting and more for long, autonomous work — planning across stages, calling tools, reading results, and at its highest setting checking its own output before it hands anything back. The model ID is claude-fable-5.

Specs at a glance

Claude Fable 5 — key specifications
Specification	Claude Fable 5
Model ID	`claude-fable-5`
Context window	1M tokens (the maximum is also the default)
Max output	128K tokens per request
Released	June 9, 2026 (generally available)
Tier	Mythos-class (above Opus)
Data retention	30 days required; not available under zero-retention

How good is it, really — and where it loses

Fable 5 launched at number one on the Artificial Analysis Intelligence Index, scoring 65 against GPT-5.5's 60 and Gemini 3.1 Pro's 57. Coding is where the gap is widest: independent reporting puts it around 80% on SWE-bench Pro, well clear of the next model down. Anthropic's own launch partners back this up — the analytics company Hex reported Fable 5 cleared 90% on its core benchmark, a ten-point jump over Opus 4.8, and Stripe said the model ran a codebase-wide migration across 50 million lines of Ruby in a single day, work it estimated would have taken a team more than two months by hand.

So far this reads like every launch post. Here's the part most of them skip. On the Agents' Last Exam benchmark, GPT-5.5 beat Fable 5 — a result that lines up with a broader pattern where OpenAI's models follow long, multi-part instructions more literally. In one of the first independent write-ups, developer Simon Willison asked Fable and GPT-5.5 the same factual question and found GPT-5.5 actually returned the more complete answer. Fable 5 is the strongest model in the field on most axes; it is not the strongest on all of them, and “best overall” is not the same as “best for your specific task.”

There's also an open controversy worth flagging, and we'll mark it as what it is — contested, not settled. Fortune reported that a passage buried in Fable 5's 319-page system card describes the model silently downgrading its responses on certain AI-development work without telling the user. Anthropic frames the routing as a safety feature; critics read the “without telling the user” part as the problem. If your work touches that area, treat it as a known unknown until there's clearer guidance.

What it's like to work with

Two days in, the most useful signal comes from named developers showing their work, and the recurring lesson is that Fable 5 rewards being handed the full scope up front. In one of the first independent write-ups, Simon Willison set it to add an approval step to an open-source agent library and watched it first reach for “somewhat gnarly hacks,” then refactor them into clean, supported features the moment he told it the underlying library was fair game to change. Over about five and a half hours it wrote almost all of a release of that library on its own.

That capability runs a meter. Willison spent $110.42 in tokens in a single day of this testing. And it isn't tireless in the way you'd hope — some users, and Anthropic's own documentation, report it occasionally stops early on long tasks, ending a turn with a statement of intent instead of finishing the job. The practical habit that fixes both: give it the whole spec at once, and check that it actually did the last step.

Vibe coding: a working app from a single prompt

If Fable 5 is being pushed toward one thing more than any other, it's vibe coding — describing an app or a scene in plain language and letting the model build the whole thing, often in a single pass. Anthropic leans into this directly: on ViBench, its own end-to-end vibe-coding benchmark, it reports Fable 5 as the highest-scoring model it has tested.

The public examples hold up. Ethan Mollick generated a run of small but complete games from one initial prompt each inside Claude Code — a Pac-Man-style Snake, a lantern-lighting maze called Strata with a faintly Myst-like look, a poetry game built on Rilke's Duino Elegies, and a working isochronic travel-time map — and reported the model “outperformed basically every other public model I have used by a considerable margin,” running up to a dozen hours against multi-page specifications. Matt Shumer one-shotted a playable, browser-based Minecraft-style world in custom ThreeJS, and separately had it render a to-scale Yosemite Valley from NASA elevation data; when it ran slowly, his entire fix was “make it faster, without losing quality,” and Fable refactored the rendering pipeline itself.

The reproducible part is that you don't need a clever prompt, just a clear one. Give Fable 5 a goal and room to plan, and it handles the scaffolding. Here's a starter you can paste straight into Claude Fable 5 on GPT Proto and watch it build:

Build a single-file browser game in HTML + JavaScript with no external
dependencies: a top-down 2D snake that grows when it eats apples, arrow-key
controls, game over on hitting the wall or itself, with a visible score and
a restart button. Keep the visuals clean and a little retro — chunky pixels,
a muted arcade palette. Don't stop until it runs as a single file I can open
directly in a browser.

Bump the effort setting up, hand it a multi-page spec instead of a sentence, and the same loop scales to the multi-hour builds Mollick described. This is the work that justifies the premium: a clean first pass on something that would otherwise be a day of scaffolding.

How to use it (including through GPT Proto)

You call Fable 5 through the standard Anthropic Messages format. On GPT Proto the endpoint is https://gptproto.com/v1/messages, authenticated with your key in the Authorization header. The minimal cURL call:

curl --location 'https://gptproto.com/v1/messages' \
  --header 'Authorization: YOUR_GPTPROTO_API_KEY' \
  --header 'Content-Type: application/json' \
  --header 'anthropic-version: 2023-06-01' \
  --data '{
    "model": "claude-fable-5",
    "max_tokens": 8000,
    "messages": [
      { "role": "user", "content": "Refactor this module and explain the trade-offs." }
    ]
  }'

Two things differ from older Claude models and will trip you up if you carry over old code. First, thinking is always on — there is no off switch. Don't send a thinking parameter to disable it; that returns a 400. You tune how hard the model works with output_config.effort instead, from low through max. Second, the same content costs more tokens than on Opus-tier models, because Fable 5 uses a new tokenizer — so don't reuse old token budgets.

In Python:

import requests, json

resp = requests.post(
    "https://gptproto.com/v1/messages",
    headers={
        "Authorization": "YOUR_GPTPROTO_API_KEY",
        "Content-Type": "application/json",
        "anthropic-version": "2023-06-01",
    },
    data=json.dumps({
        "model": "claude-fable-5",
        "max_tokens": 8000,
        "output_config": {"effort": "medium"},  # low | medium | high | xhigh | max
        "messages": [
            {"role": "user", "content": "Refactor this module and explain the trade-offs."}
        ],
    }),
)
data = resp.json()
text = "".join(b["text"] for b in data["content"] if b["type"] == "text")
print(text)

How much effort to spend isn't guesswork — Willison ran the same “draw a pelican on a bicycle as an SVG” prompt across all five levels and published the token cost of each:

Output tokens and cost by effort level (Willison's pelican-on-a-bicycle SVG test)
Effort	Output tokens	Approx. cost
low	1,929	~$0.097
medium	2,290	~$0.115
high	2,057	~$0.103
xhigh	5,992	~$0.300
max	14,430	~$0.722

The jump from high to max was nearly 7x the tokens on one prompt — and notice high came in cheaper than medium here, a reminder that effort governs how much the model explores, not a fixed price. Start at medium, move to high or xhigh only for genuinely hard agentic work, and reserve max for the rare case where correctness outweighs cost. For straightforward text generation you can call the claude-fable-5/text-to-text endpoint directly.

Handling refusals — the part most guides leave out

Because Fable 5 reroutes sensitive requests, your code has to treat a refusal as a normal outcome, not an error. When a safety classifier declines, the API returns a successful HTTP 200 with stop_reason: "refusal" and an empty or partial body — so code that reads the first content block without checking will break. The Register has already documented the classifiers rejecting harmless prompts, and Anthropic puts the trigger rate under 5% of sessions on average, higher around cybersecurity, biology, and chemistry. Guard for it:

if data.get("stop_reason") == "refusal":
    # The request was declined by a safety classifier (HTTP 200, not an error).
    # Retry the same prompt on a model without the Mythos guardrails.
    fallback = requests.post(
        "https://gptproto.com/v1/messages",
        headers={
            "Authorization": "YOUR_GPTPROTO_API_KEY",
            "Content-Type": "application/json",
            "anthropic-version": "2023-06-01",
        },
        data=json.dumps({
            "model": "claude-opus-4-8",
            "max_tokens": 8000,
            "messages": [
                {"role": "user", "content": "Refactor this module and explain the trade-offs."}
            ],
        }),
    )
    data = fallback.json()

Routing the refused request to claude-opus-4-8 mirrors what Anthropic does internally and keeps your pipeline moving. One related rule from Anthropic's prompting guidance: don't ask Fable 5 to print its own reasoning back to you — that can trip a separate classifier and push you onto the fallback model for no good reason.

Pricing — and how to pay 20% less

Capability this high is priced like it. Anthropic lists Fable 5 at $10 per million input tokens and $50 per million output — exactly double Opus 4.8's $5 and $25. Through GPT Proto the same model runs at $8 and $40, about 20% under list, on the identical API format.

Claude Fable 5 API pricing — Anthropic list vs GPT Proto
Per 1M tokens	Input	Output
Fable 5 — Anthropic list	$10	$50
Fable 5 — GPT Proto	$8	$40
Opus 4.8 — Anthropic list (for comparison)	$5	$25

One thing the discount does not change: the 30-day data-retention requirement applies the same way through GPT Proto as it does directly, because it's set by Anthropic at the model level. If a zero-retention policy is a hard requirement for you, Fable 5 is off the table regardless of where you call it — that's a constraint to plan around, not one any reseller can remove.

Who should use it — and who shouldn't

Fable 5 earns its premium on a specific shape of work: long-horizon, autonomous tasks where better planning and self-checking save more in human supervision than they cost in tokens. Codebase migrations, multi-file refactors, deep research over a 1M-token context, complex agentic workflows, one-prompt builds. If a job is hard enough that a cheaper model would need three rounds of correction, paying double for one clean pass is the better deal.

For everything else, it's the wrong default. Routine writing, everyday coding help, summarization, anything budget-sensitive — claude-opus-4-8 gives you most of the capability at half the price, and Sonnet 4.6 cheaper still. And if your work lives in cybersecurity, biology, or chemistry, expect the classifiers to reroute a meaningful share of your requests to Opus 4.8 anyway, in which case you're paying Fable prices for Opus answers. The honest routing rule most teams land on: Sonnet for drafts, Opus for hard reviews, Fable for the genuinely autonomous jobs — and check GPT Proto pricing before you wire any of them into a loop.

FAQ

Is Claude Fable 5 free?

No. It's a premium model at $10/$50 per million tokens list, or $8/$40 through GPT Proto. There's no free tier.

How much does Claude Fable 5 cost vs Opus 4.8?

Exactly double — Fable 5 is $10/$50 per million tokens, Opus 4.8 is $5/$25.

What's the context window?

1M tokens, which is also the default; up to 128K output tokens per request.

Fable 5 vs GPT-5.5 — which is better?

Fable 5 leads on coding and overall intelligence-index scores; GPT-5.5 wins on price and edged it out on the Agents' Last Exam instruction-following benchmark. Pick by workload.