PUBLISHED: April 26, 2026 | LAST UPDATED: April 26, 2026

GPT-5.5 Released: What's Actually New, Who Gets It, and Whether It's Worth It

OpenAI just shipped its sixth major model in eight months, and this time it's not a minor update. GPT-5.5 — codenamed "Spud" internally, which is a refreshingly honest name for something OpenAI is calling "a new class of intelligence" — landed on April 23, 2026, less than seven weeks after GPT-5.4 dropped in March.

That release cadence alone tells you something. This is not a company carefully maturing a single platform. It's a company in a full sprint, shipping models the way startups ship features — fast, iterative, and with enough marketing language to make a copywriter blush. So what actually changed? Let's get into it.

What Is GPT-5.5?

GPT-5.5 is OpenAI's latest frontier model, released April 23, 2026, and positioned as the new default model in ChatGPT and Codex. OpenAI calls it their "smartest and most intuitive to use model yet" — a sentence every new model gets, so take that accordingly.

What separates this one from the recent string of decimal upgrades is under the hood. Unlike GPT-5.1 through GPT-5.4, which shared the same foundational pre-training and layered improvements on top, GPT-5.5 is built on a completely new pre-train. That means its base knowledge and reasoning patterns are structurally different — not just tuned differently. The downstream effect shows up in exactly the areas where the model improves most: long-horizon reasoning, agentic multi-step task completion, and understanding system-level architecture in large codebases.

It ships in two variants: standard GPT-5.5 and GPT-5.5 Pro, the latter applying parallel test-time compute for harder tasks. A one million token context window comes standard — a first for OpenAI's API.

The Agentic Shift: What It Actually Means

The word "agentic" gets thrown around a lot right now. Here's what it actually means in the context of GPT-5.5: you can hand the model a messy, multi-step task and not babysit it. It plans, uses tools, checks its own work, navigates ambiguity, and keeps going. According to OpenAI's official announcement (openai.com/index/introducing-gpt-5-5), the gains are concentrated in four specific areas:

Agentic coding — writing, debugging, refactoring, and validating code across long sessions without losing context. GPT-5.5 holds context across large system architectures and can predict downstream impacts when it fixes something upstream.

Computer use — the model can interpret software interfaces, take actions inside applications, and move between tools. It scored 78.7% on OSWorld-Verified, up from 75.0% on GPT-5.4. That's not perfect, but it's meaningfully closer to useful in production.

Knowledge work — document analysis, data synthesis, spreadsheet creation, multi-tool research. OpenAI reports that over 85% of its own employees now use Codex weekly across teams including finance, marketing, and communications — not just engineering.

Scientific research — more on this below, because it deserves its own section.

Greg Brockman, OpenAI's co-founder and president, called it "a big step towards more agentic and intuitive computing" during a press briefing, per TechCrunch's coverage (techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp). He also described GPT-5.5 as "a faster, sharper thinker for fewer tokens compared to something like 5.4" — which is the one claim in this launch worth paying attention to most, because token efficiency maps directly to cost.

Token Efficiency: The Real Story

GPT-5.5 delivers higher-quality outputs with fewer tokens and fewer retries than GPT-5.4. On Artificial Analysis's Coding Agent Index, OpenAI reports it delivers state-of-the-art intelligence at roughly half the cost of competing frontier coding models on a per-task basis. It also matches GPT-5.4's per-token latency in real-world serving — meaning this larger, smarter model doesn't slow you down.

One more notable detail: GPT-5.5 runs on NVIDIA GB200 and GB300 NVL72 systems, and the model itself wrote custom heuristic algorithms to balance work across GPU cores, increasing token generation speed by over 20% during serving. The AI optimized its own inference infrastructure. That's either impressive or unsettling, depending on your disposition.

Benchmark Breakdown: The Numbers Worth Knowing

Let's go through the numbers that actually matter. Benchmarks are vendor-reported and often contested, so treat these as directional signals rather than settled truth.

Terminal-Bench 2.0 (agentic coding): GPT-5.5 scores 82.7%, up from GPT-5.4's 75.1%. Claude Opus 4.7 sits at 69.4%.

SWE-Bench Pro (GitHub issue resolution): GPT-5.5 reaches 58.6%. Claude Opus 4.7 scores higher here, though OpenAI has flagged potential memorization artifacts in that score — treat the comparison with caution.

OSWorld-Verified (computer use): GPT-5.5 at 78.7%, GPT-5.4 at 75.0%, Claude Opus 4.7 at 78.0%.

GDPval (agent tasks across 44 occupations): GPT-5.5 scores 84.9%.

MRCR v2 (long-context retrieval at 1M tokens): GPT-5.5 hits 74.0%. GPT-5.4 was at 36.6%. Claude Opus 4.7 is at 32.2%.

Tau2-bench Telecom (customer service workflows): GPT-5.5 at 98.0%, up from GPT-5.4's 92.8%.

Sources: OpenAI's official launch page and BenchLM's live rankings (benchlm.ai/models/gpt-5-5), where GPT-5.5 currently sits at #2 out of 112 models in agentic tool use and computer task benchmarks.

The long-context retrieval jump is the most striking figure in that list. Going from 36.6% to 74.0% on the 512K–1M token retrieval benchmark is not incremental — it changes what the model can actually do. Entire-codebase reasoning and multi-document research become genuinely more reliable at that scale.

Where Claude Still Wins

Claude Opus 4.7 retains advantages on SWE-Bench Pro, MCP Atlas, and Humanity's Last Exam with Tools. Gemini 3.1 Pro leads on ARC-AGI-1. For teams building codebase-first agents — PR review, multi-language refactoring, IDE-integrated coding — Claude Opus 4.7 is still the stronger default per current evaluations.

The honest read: GPT-5.5 wins on terminal-first, pipeline-heavy, and long-context agentic tasks. Claude Opus 4.7 wins on certain software engineering benchmarks and day-one multi-cloud availability. This is not a clean knock-out. It's a routing decision.

The Science Surprise No One Is Talking About

The coding and computer-use numbers are getting most of the attention. The science numbers are getting far less — and they're arguably more significant for where AI is actually headed.

As reported by gHacks (ghacks.net/2026/04/24/openai-releases-gpt-5-5-with-stronger-agentic-coding-computer-use-and-scientific-research-capabilities), GPT-5.5 scores 25.0% on GeneBench — a multi-stage scientific data analysis benchmark in genetics and quantitative biology — up from GPT-5.4's 19.0%. GPT-5.5 Pro pushes that to 33.2%. On BixBench, a bioinformatics benchmark, GPT-5.5 reaches 80.5% compared with 74.0% for GPT-5.4.

Those percentages don't sound dramatic in isolation. But GeneBench tests multi-stage workflows in genetics and quantitative biology — not simple retrieval. A six-point improvement in that domain reflects something structurally different in how the model reasons through scientific problems. Mark Chen, OpenAI's chief research officer, said at the launch briefing that GPT-5.5 could "help expert scientists make progress" and specifically mentioned drug discovery.

Perhaps the most striking detail: an internal version of GPT-5.5 with a custom harness contributed to a new proof about Ramsey numbers in combinatorics, later verified in Lean. That's a peer-review-adjacent result, not a benchmark score. It suggests the model's scientific utility is moving past pattern matching into something that at least resembles collaborative reasoning.

Who Gets Access and When

GPT-5.5 is rolling out now to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users. As of April 24, 2026, both variants are also available in the API, confirmed in OpenAI's updated announcement.

Free ChatGPT users are not in the initial rollout. OpenAI's pattern suggests free-tier access typically arrives weeks to months after the paid rollout — and given the model's compute requirements, don't hold your breath for an immediate trickle-down.

The official ChatGPT release notes (help.openai.com/en/articles/6825453-chatgpt-release-notes) confirm GPT-5.5 becomes the new default model in ChatGPT and Codex. API model IDs are gpt-5.5 and gpt-5.5-pro, on the Responses and Chat Completions endpoints. Context window is one million tokens for both; Codex sessions run at 400K tokens.

What It Costs

API pricing for GPT-5.5 is $5 per million input tokens and $30 per million output tokens. That's double the input cost of GPT-5.4, which was priced at $2.50/$15. OpenAI says the effective price increase is closer to 20% once token efficiency is factored in — the model completes the same tasks in fewer passes. That math is plausible, but verify it against your actual workloads before committing.

GPT-5.5 Pro sits at a different tier entirely: $30 per million input tokens and $180 per million output tokens. At that pricing, it's designed for tasks where a single correct answer justifies the compute cost — legal analysis, financial modelling, scientific research. Running a general-purpose chatbot on GPT-5.5 Pro would be like using a scalpel to cut a birthday cake.

For comparison, Claude Opus 4.7 is priced at $5/$25 per million tokens — slightly cheaper on output, with day-one availability across Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry. Teams with existing cloud spend commitments on AWS or GCP may find Anthropic's multi-cloud rollout more convenient than OpenAI's API-only launch.

The Bigger Picture: Super App or Super Hype?

The GPT-5.5 launch can't be read in isolation. Brockman used the briefing to push a concept OpenAI has been quietly building toward: a "super app" that combines ChatGPT, Codex, and an AI browser into a single unified service for enterprise workflows. As CNBC noted (cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html), the vision is one interface that handles complex work end-to-end without routing between separate tools.

That's a real strategic bet, and GPT-5.5's computer use capabilities are a functional prerequisite for it. A model that can operate software, navigate browsers, read interfaces, and move between apps autonomously is more than a better chatbot — it's the kernel of an operating environment. Whether OpenAI can ship that as a coherent product is a different question.

Jakub Pachocki, OpenAI's chief scientist, said at the briefing that he considers the last two years "surprisingly slow" and expects "extremely significant improvements in the medium term." That's either a genuine forecast or a pointed message to Anthropic and Google. Probably both.

OpenAI has shipped GPT-5, GPT-5.1, GPT-5.2, GPT-5.3, GPT-5.4, and now GPT-5.5 since August 2025 — six major named releases in under nine months. The cadence signals that OpenAI has no intention of waiting for a perfect GPT-6 moment. It's building a living stack and retiring old layers as new ones ship.

Bottom Line

GPT-5.5 is a real upgrade, not a branding exercise. Three things actually matter here.

First, the new pre-train is structurally different from its predecessors, and that shows up most clearly in long-context retrieval — jumping from 36.6% to 74.0% on the 1M-token benchmark is not incremental.

Second, the token efficiency gains make the API price increase less painful than the sticker suggests. If GPT-5.5 genuinely completes tasks in fewer passes, the 20% effective cost increase claim becomes defensible — but test it on your own workloads.

Third, the scientific research benchmarks are a quiet signal that AI for research is moving past marketing into measurable capability. A contribution to a verified combinatorics proof is not a chatbot party trick.

What doesn't change: if your workflow is codebase-heavy and IDE-integrated, Claude Opus 4.7 still competes seriously. If you're running at scale on a tight budget, the open-source tier has genuinely closed the gap on many workloads. GPT-5.5 is the best agentic coding model available today. It is not automatically the right choice for every team.

OpenAI is releasing a meaningful new model every six to eight weeks. Most people find out about these shifts after the fact — after their competitors have already integrated the new capability. BitBiased covers every model drop, tool update, and practical implication in a weekly newsletter built for people who'd rather spend five minutes reading than two hours catching up. Subscribe free at bitbiased.ai/subscribe.