AI Frontlines

Claude Sonnet 5 Default Closes Opus Gap

Claude Sonnet 5 Default Closes Opus Gap

Anthropic launched Claude Sonnet 5 on June 30, 2026, making it the default AI model for Free and Pro subscription tiers and replacing Sonnet 4.6. The company says the new model narrows the performance gap with its Opus flagship line while remaining substantially cheaper to operate.

For developers running automated, multi-step workflows, the upgrade matters less because of benchmark numbers and more because of what those numbers enable architecturally: a single model that covers a continuous cost-performance curve from light tasks to near-Opus-grade autonomous work, depending on how much compute a developer assigns to each call.

It describes Sonnet 5 as its most agentic Sonnet yet — capable of making multi-step plans, using tools like browsers and terminals, and running through complex tasks that previous Sonnet models would stall on before completing.

Sonnet 5 arrives as Anthropic prepares for possible IPO

The launch comes as the firm prepares for what could become one of the largest technology IPOs in history. Anthropic confidentially filed a draft S-1 registration statement with the Securities and Exchange Commission on June 1, 2026, following a $65 billion Series H round that valued it at $965 billion.

Sonnet 5 is the first major model release since that filing and since the June 12 Commerce Department order that suspended Claude Fable 5 and Mythos 5 for all users worldwide. Those models remain offline for general customers. Sonnet 5 and Opus 4.8 are now the effective ceiling of what most developers can access.

How effort levels create a sliding cost-performance curve

The single most consequential architectural feature of Sonnet 5 is not a benchmark score. It is the model’s position on a tunable cost-performance curve that overlaps meaningfully with Opus 4.8.

AI models typically charge a flat rate per token regardless of how hard the model works on a request.

The effort level system changes that.

Developers can instruct the model to apply more or less compute to a task — choosing from low, medium, high, xhigh, or max — trading cost against output quality.

Related: Base Editing Reveals NANOG Builds All Tissues

What the Sonnet 5 launch charts show — based on the BrowseComp agentic search benchmark and the OSWorld-Verified computer use evaluation — is that the model at high or extra-high effort levels achieves performance comparable to Opus 4.8 on some task categories. At medium effort, it is substantially cheaper than the flagship while still outperforming the prior version at any setting.

The practical implication: teams that previously used it for routine tasks and Opus for complex ones may now be able to route a wider share of complex work through Sonnet 5 at raised effort levels, reserving Opus 4.8 only for tasks that specifically require its stronger agentic search or computer use performance.

Opus 4.8 remains the better choice for the highest-accuracy requirements on those specific tasks and for cybersecurity work that requires reduced guardrails.

On the benchmark that targets agentic coding — the measure most relevant to automated pipeline deployments — Sonnet 5 scores 63.2 percent compared to Opus 4.8’s 69.2 percent and the outgoing Sonnet 4.6’s 58.1 percent.

On knowledge work tasks, the model slightly edges ahead of the flagship.

The full evaluation data is in the Sonnet 5 system card.

What early developers saw: completed tasks where prior Sonnets stalled

Early access developers described a consistent improvement in what Anthropic calls follow-through: the ability to complete multi-step tasks without stalling partway through. Daniel Shepard, a senior engineer at Zapier, said his team handed it a combined Salesforce account update and email launch task — a workflow that previously required human intervention at the halfway point — and the model completed it without a stop. Fabian Hedin, co-founder of Lovable, noted a quality that is rarely benchmarked but matters significantly in consumer-facing deployments: consistent, clean refusal of unsafe requests. A model deployed at scale that refuses appropriately and reliably is, in Hedin’s framing, as operationally important as raw capability when putting powerful tools in the hands of millions of users.

Sonnet 5 also checks its own outputs without being explicitly prompted to do so — a behavior change from the prior version that reduces the rate of compounding errors in automated pipelines, where a mistake in one step propagates through all subsequent steps.

What the tokenizer change means for your bill

The company offers an introductory API rate of $2 per million input tokens and $10 per million output tokens through August 31, 2026. After that date, pricing moves to standard rates: $3 per million input and $15 per million output. For comparison, Opus 4.8 costs $5 per million input and $25 per million output — meaning the new model at standard pricing is 40 percent cheaper on both inputs and outputs.

Related: DeepSeek launches open source AI project

There is a meaningful asterisk.

Sonnet 5 uses an updated tokenizer — the same revision introduced with Opus 4.7 — that changes how the model processes text. The same input can map to 1.0 to 1.35 times as many tokens as it would have under the previous tokenizer, depending on content type. The introductory pricing is designed to keep the transition roughly cost-neutral for Sonnet 4.6 users. But developers should audit their actual token consumption against the new tokenizer before assuming the upgrade is free after September 1.

Agentic workflows are particularly exposed to this dynamic. A model that plans, verifies, and iterates across multiple tool calls generates far more tokens per completed task than a single-turn chatbot response. The effort level architecture compounds this: higher effort settings mean more tokens spent per inference call.

Safety and prompt injection resistance

Anthropic’s pre-deployment safety evaluations found Sonnet 5 improved on the prior version on the behaviors most relevant to agentic deployment: lower rates of hallucination, lower rates of sycophancy — the tendency to agree with incorrect premises — and improved resistance to prompt injection attacks.

Prompt injection is a cyberattack category specific to language models deployed in automated contexts. When a model processes external content — a webpage, an email, a document retrieved by a tool — that content may contain adversarial instructions designed to override the model’s original instructions and redirect it toward a different goal. As models like Sonnet 5 are deployed in agentic pipelines that regularly fetch untrusted external content, this attack surface expands significantly.

Sonnet 5’s improved resistance here is a specific engineering gain.

In a live bug bounty hosted with Gray Swan, only 0.19 percent of unique attacks succeeded against the model — matching Opus 4.8 and outperforming GPT-5.5 at 3.08 percent. On broader safety, it carries the same real-time cybersafeguard classifiers as Opus 4.7 and 4.8 — systems that detect and block dangerous cybersecurity requests in real time. These safeguards are less strict than those deployed with the currently-suspended Fable 5, which blocked a wider range of security tasks.

The firm did not deliberately train Sonnet 5 on cybersecurity tasks. In evaluations testing the ability to develop working software exploits for Firefox 147, it scored zero percent — the same result as the earlier version. A slight increase in partial-success rates, attributed to general intelligence improvements rather than specific cybersecurity training, prompted Anthropic to enable the real-time safeguards. Sonnet 5 shows somewhat higher rates of misaligned behavior than Opus 4.8 and Claude Mythos Preview, both of which remain the safer choice for high-stakes or sensitive deployments.

Sonnet 5 in a crowded agentic AI field

The launch arrives four days after OpenAI released GPT-5.6 Sol in preview, which OpenAI also framed as its most agentic offering — capable of distributing work across subagents for extended autonomous runs. Google’s Gemini 3.5 Flash, launched in May, carried a similar pitch: a shift from conversational chatbot to autonomous planning and execution tool.

Related: Redmi Unveils Note 14 Pro Plus Smartphone

Sonnet 5 is priced below Opus 4.8, GPT-5.5, and Google Gemini 3.1 Pro at both introductory and standard rates.

It remains more expensive than Gemini 3.5 Flash.

The more significant distinction is the effort-level architecture: rather than choosing between a capable-but-expensive frontier model and a cheaper-but-limited mid-tier option, developers can tune a single model across a wider cost-performance range within a single API call.

The company also launched Claude Science on June 30 — a desktop application for scientific research that integrates tools and packages commonly used by researchers, produces auditable research artifacts, and provides flexible access to computing resources. It framed the release as consolidating fragmented research tooling into a single environment, with pre-configured support for genomics, single-cell analysis, proteomics, and cheminformatics. Rate limits have been increased across Chat, Cowork, Claude Code, and the Claude Platform to accommodate the higher token volumes that come with more capable agentic tasks — which plan, iterate, and call tools across longer sessions. The firm simplified its API tier structure to three levels — Start, Build, and Scale — in April 2026.

Current limits are visible in the Claude Console.

Sonnet 5 does not directly replace Fable 5 for users who lost access when the US government suspended it.

They are different products with different capability profiles.

For the overwhelming majority of development tasks — coding, automation, knowledge work — the model covers much of what Fable 5 could do. For frontier cybersecurity and advanced reasoning tasks that were Fable 5’s specific differentiator, Sonnet 5 is not a substitute; Opus 4.8 remains the highest-capability model currently available to general users.

Leave a Comment

Your email address will not be published. Required fields are marked *