
OpenAI has introduced GPT‑5.1-Codex-Maxa new frontier agent coding model now available in your Codex developer environment. The launch marks a significant step forward in AI-assisted software engineering, offering real-time interactive capabilities, improved efficiency and long-term reasoning. GPT-5.1-Codex-Max will now replace GPT-5.1-Codex as the default model on all surfaces built into Codex.
The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks in multiple contextual windows.
It comes on the heels of Google launching its powerful new Gemini 3 Pro model yesterday, but still beats or matches it in key coding benchmarks:
In SWE-Bench verified, GPT-5.1-Codex-Max achieved an accuracy of 77.9% with extra high reasoning effort, surpassing the 76.2% of Gemini 3 Pro.
It also led to Terminal-Bench 2.0, with 58.1% accuracy compared to 54.2% for Gemini, and matched Gemini’s score of 2439 on LiveCodeBench Pro, a competitive Elo coding benchmark.
When compared to the Gemini 3 Pro’s most advanced configuration, its Deep Thinking model, Codex-Max also has a slight advantage in agent coding benchmarks.
Performance Benchmarks: Incremental Gains on Key Tasks
GPT-5.1-Codex-Max demonstrates measurable improvements over GPT-5.1-Codex on a variety of standard software engineering benchmarks.
On SWE-Lancer IC SWE, it achieved an accuracy of 79.9%, a significant increase over GPT-5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it achieved 77.9% accuracy with extra high reasoning effort, surpassing GPT-5.1-Codex’s 73.7%.
Performance on Terminal Bench 2.0 (n=89) showed more modest improvements: GPT-5.1-Codex-Max achieved 58.1% accuracy compared to 52.8% for GPT-5.1-Codex.
All evaluations were performed with compaction and extra high reasoning effort enabled.
These results indicate that the new model offers a higher bound on both comparative correctness and real-world usability under extended reasoning loads.
Technical architecture: long-term reasoning through compaction
A major architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effectively during extended input and output sessions using a mechanism called compaction.
This allows the model to retain key contextual information while discarding irrelevant details as it approaches the limit of its context window, effectively enabling continuous work on millions of tokens without performance degradation.
The model has been observed internally to complete tasks lasting longer than 24 hours, including multi-step refactorings, test-driven iterations, and autonomous debugging.
Compaction also improves token efficiency. At medium reasoning effort, GPT-5.1-Codex-Max used approximately 30% fewer thought tokens than GPT-5.1-Codex to achieve comparable or better accuracy, which has implications for both cost and latency.
Integration of platforms and use cases
GPT‑5.1-Codex-Max is currently available in multiple Codex-based environments, which refer to OpenAI’s own built-in tools and interfaces built specifically for code-centric AI agents. These include:
-
Codex CLIthe official OpenAI command line tool (@openai/codex), where GPT‑5.1-Codex-Max is now available.
-
IDE extensionslikely developed or maintained by OpenAI, although no specific third-party IDE integrations were named.
-
Interactive coding environmentssuch as those used to demonstrate frontend simulation applications such as CartPole or Snell’s Law Explorer.
-
Internal code review toolsused by OpenAI engineering teams.
For now, GPT-5.1-Codex-Max is not yet available via the public API, although OpenAI says it will be available soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI.
It is currently unconfirmed if and how the model will be integrated into third-party IDEs, unless they are built on top of the CLI or a future API.
The model is capable of interacting with tools and live simulations. Examples shown in the release include:
-
An interactive CartPole policy gradient simulator, visualizing activations and reinforcement learning training.
-
A Snell’s Law optical scanner that supports dynamic ray tracing through refractive indices.
These interfaces exemplify the model’s ability to reason in real time while maintaining an interactive development session, effectively uniting computation, visualization, and deployment within a single loop.
Cybersecurity and protection restrictions
While GPT-5.1-Codex-Max does not meet OpenAI’s “high” capability threshold for cybersecurity according to its readiness framework, it is currently the most capable cybersecurity model OpenAI has implemented. Supports use cases such as automated vulnerability detection and remediation, but with a strict sandbox environment and network access disabled by default.
OpenAI does not report any increase in malicious use at scale, but has introduced improved monitoring systems, including activity routing and kill mechanisms for suspicious behavior. Codex remains isolated in a local workspace unless developers opt for broader access, mitigating risks such as rapid injection of untrusted content.
Developer Implementation and Usage Context
GPT‑5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu and Enterprise plans. It will also become the new default in Codex-based environments, replacing GPT-5.1-Codex, which was a more general-purpose model.
OpenAI claims that 95% of its internal engineers use Codex weekly, and since its adoption, these engineers have submitted approximately 70% more pull requests on average, highlighting the tool’s impact on internal development speed.
Despite its autonomy and persistence, OpenAI emphasizes that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test quotes, and tool call outputs to support transparency in the generated code.
Perspective
GPT-5.1-Codex-Max represents a significant evolution in OpenAI’s strategy toward agentic development tools, offering greater depth of reasoning, token efficiency, and interactive capabilities across all software engineering tasks. By expanding its compaction and context management strategies, the model is positioned to handle tasks at the scale of entire repositories, rather than individual files or fragments.
With a continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max lays the foundation for the next generation of AI-assisted programming environments, while underscoring the importance of monitoring in increasingly autonomous systems.
#OpenAI #introduces #GPT5.1CodexMax #coding #model #completed #24hour #task #internally