Skip to main content
News

Claude Opus 4.8 Retakes #1 from GPT-5.5: Dynamic Workflows Let Claude Code Run Hundreds of Subagents in One Session

On 28 May 2026, Anthropic released Claude Opus 4.8, reclaiming the #1 position on the Artificial Analysis Intelligence Index at 61.4, ahead of GPT-5.5 by 1.2 points, and scoring 69.2% on SWE-Bench Pro for coding. The headline new feature is Dynamic Workflows in Claude Code, which runs hundreds of parallel subagents in a single session and handles codebase-scale migrations across hundreds of thousands of lines. Code flaw rate is roughly four times lower than the predecessor. This piece covers the release, what changed, and what it means for teams using Claude Code daily.

ClaudeAnthropicOpus 4.8AI ModelDynamic WorkflowsClaude CodeCoding

TL;DR

On 28 May 2026, Anthropic released Claude Opus 4.8, taking back the #1 position on the Artificial Analysis Intelligence Index at 61.4, up from Opus 4.7 at 57.3, and ahead of GPT-5.5 (xhigh) at 60.2.

Numbers that matter for development teams. SWE-Bench Pro at 69.2%, ahead of GPT-5.5 and Gemini 3.1 Pro. Online-Mind2Web at 84% for browser agents. The Legal Agent Benchmark, where Opus 4.8 is the first model to cross the 10% all-pass mark.

The biggest new feature is Dynamic Workflows in Claude Code, which runs hundreds of parallel subagents in a single session and handles codebase-scale migrations across hundreds of thousands of lines. Alongside it, Effort Control lets users select response effort and Messages API now accepts mid-task system entries without breaking prompt cache.

Pricing is unchanged from Opus 4.7. Five dollars per million input tokens and twenty-five per million output. Fast Mode at ten and fifty, three times cheaper than the previous Fast Mode.

For teams using Claude Code as a daily tool, like Enersys, this is a structural step up, not an incremental update.


What Happened on 28 May 2026

Anthropic announced Claude Opus 4.8 through its official newsroom, with immediate availability through the Claude API as claude-opus-4-8.

TechCrunch reported that the release came with a new tool, Dynamic Workflows in Claude Code, which Anthropic launched as a research preview for Enterprise, Team, and Max plans of existing Claude Code customers.

Artificial Analysis, the independent publication tracking AI model benchmarks, confirmed the next day that Opus 4.8 reclaimed the #1 position on its Intelligence Index. GPT-5.5 (xhigh) had held that position for several months.


The Benchmark Numbers That Matter

Numbers reported by Anthropic and Artificial Analysis.

Artificial Analysis Intelligence Index. 61.4, ahead of Opus 4.7 by 4.1 and ahead of GPT-5.5 by 1.2.

SWE-Bench Pro. 69.2% on the benchmark that tests end-to-end bug fixing in real codebases. Opus 4.8 beat GPT-5.5 and Gemini 3.1 Pro. GPT-5.5 still leads on the terminal-coding benchmark specifically.

GDPval-AA. 1,890 Elo for agentic performance on knowledge work, implying roughly a 67% win rate against GPT-5.5.

Online-Mind2Web. 84% on browser agent performance.

Legal Agent Benchmark. Opus 4.8 is the first model to break 10% on the all-pass standard.

Super-Agent benchmark. The only model to complete every case end-to-end.

These numbers matter when choosing a model for a specific task. They are not a substitute for evaluating the model on the team's own workload. One benchmark does not reflect every kind of work.


Dynamic Workflows Is the Biggest Feature

In the Claude Code research preview, Anthropic opened Dynamic Workflows, which changes how Claude Code operates in a meaningful way.

Before Dynamic Workflows, Claude Code operated as a single agent running tasks sequentially. Subagents existed but at limited scale, and orchestration was not at the workflow level.

Anthropic now describes Dynamic Workflows as "enabling Claude to run hundreds of parallel subagents in a single session" and handle "codebase-scale migrations across hundreds of thousands of lines of code."

What this means for developers.

First, work that is parallel by nature, such as a migration applying one pattern across many files, gets a significant speedup. Instead of Claude moving through files sequentially, subagents work in parallel under orchestration.

Second, work that needs multi-directional exploration, such as a refactor with several possible designs, can be tried in parallel by different subagents.

Third, verification of cross-cutting concerns, such as a repo-wide security audit or test coverage check, can happen inside a single session.

Where to be careful. Each subagent consumes its own tokens. A session using Dynamic Workflows can cost several times more than a single agent session. Budget and approval workflows in an enterprise setting need consideration before rolling out to large teams.


Effort Control Balances Quality and Cost

The second feature, Effort Control, is in claude.ai and Cowork.

Users choose the effort level for a response, balancing quality, token usage, and speed. For exploration or quick Q&A, lower effort saves tokens. For a critical task such as a production code change that needs careful reasoning, higher effort is appropriate.

For enterprise administrators, this is a cost-control lever that does not require touching prompts or switching models.


Messages API Accepts Mid-Task System Entries

The third feature looks small at the user level. It is significant at the developer integration level.

The new Messages API accepts system entries inside the messages array. When a system needs to inject an instruction mid-task, the API supports it. Examples include a user changing project context, or an orchestrator adjusting agent behaviour mid-task.

The key point Anthropic emphasises. The mid-task system update does not break the prompt cache. For teams paying the much lower cached input token price (versus uncached), this materially lowers long-run cost.


The Safety Number Worth Noting

Anthropic emphasised in the announcement that Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked."

This matters because AI-generated code flaws that ship to production are a significant source of technical debt and security incidents. Anthropic's alignment assessment also reported that Opus 4.8 "reaches new highs on our measures of prosocial traits" with "substantially lower" rates of misaligned behaviour against Opus 4.7.

In practice, a 4x lower code flaw rate means lower review effort by the team, lower probability of bugs shipping, and higher trust in AI-generated code overall.

Even so, human review at critical points remains a discipline worth holding. Karpathy in his April 2026 Sequoia interview maintained that AI agents remain intern entities that need mentoring. Opus 4.8 being better does not retire that requirement.


Pricing and Access

Pricing is unchanged from Opus 4.7.

  • Standard. Five dollars per million input tokens, twenty-five per million output.
  • Fast Mode. Ten dollars per million input tokens, fifty per million output. Three times cheaper than the previous Fast Mode.
  • Context. One million tokens by default.
  • Max output. 128k tokens.
  • Adaptive thinking and mid-conversation system messages.
  • Workflows (planning plus parallel subagents) in Claude Code as a research preview, for Enterprise, Team, and Max plans.

API name. claude-opus-4-8.


What This Means for the Enersys Team

Enersys uses Claude Code as a primary tool in daily development. Opus 4.8 lands in three ways.

Higher velocity. Dynamic Workflows opens up parallel work that previously took days, such as codebase-scale refactors or migrations, to complete within a single session. For client ERP migration projects with legacy systems running hundreds of thousands of lines, this changes the economics of the work.

Higher code quality. The four-times-lower flaw rate cuts review burden on senior developers. The team can spend more time on architecture decisions and less on syntax issues that AI introduced.

Better cost predictability. Effort Control plus cached input via the Messages API gives administrators tighter budget control across a development team.

In the next steering committee, the Enersys team will revisit the next quarter's Claude API budget and assess which client workloads should move to Dynamic Workflows as a default.


Closing

Claude Opus 4.8 is not just a benchmark step. It is a tool category shift. Dynamic Workflows opens up large-scale work in Claude Code that previously required human orchestration.

For teams that use Claude Code every day, now is the time to experiment with Dynamic Workflows on workloads that parallelise naturally, and measure the velocity gain against the cost increase. For organisations evaluating new AI coding tools, Opus 4.8 reclaiming #1 on the AA Index and a 4x improvement on code flaw rate are worth putting on the shortlist.


Sources

"Empowering Innovation,
Transforming Futures."

ติดต่อเราเพื่อทำให้โปรเจกต์ของคุณเป็นจริง