OpenAI Codex: Remote Software Agent That Can Run Many Coding Tasks In Parallel

AI Summary

OpenAI has introduced Codex, a cloud-based AI agent designed to autonomously handle multiple software engineering tasks concurrently. Unlike prior AI coding assistants that provided autocomplete, Codex can write features, debug, answer codebase questions, and propose pull requests within isolated cloud environments. Powered by codex-1, a version of OpenAI's o3 model optimized for software engineering and trained with reinforcement learning, it allows developers to delegate tasks through the ChatGPT sidebar, monitoring progress and reviewing verifiable evidence of the AI's actions.

May 20 2025 17:33
OpenAI has unveiled Codex, a cloud-based software engineering agent capable of handling multiple coding tasks simultaneously. Unlike previous AI coding assistants that primarily offered autocomplete suggestions, Codex operates as an autonomous agent that can write features, fix bugs, answer questions about your codebase, and even propose complete pull requests—all while working in parallel across separate cloud environments.

The release marks a pivotal shift in how developers might approach their workflow in the coming years, potentially transforming the relationship between engineers and their code.

What Is Codex?

At its core, Codex is a sophisticated AI agent powered by codex-1, a version of OpenAI's o3 model specifically optimized for software engineering. The system was trained using reinforcement learning on real-world coding tasks across diverse environments, enabling it to generate code that closely resembles human style while adhering to specific instructions and project requirements.

What separates Codex from previous coding assistants is its ability to operate independently in isolated cloud environments. Each task assigned to Codex runs in a separate sandbox preloaded with your repository, allowing the agent to read and modify files, execute commands, run tests, and verify its own work—much like a remote teammate would.

"Today, we're going to take a step towards where we think software engineering is going," explained Greg Brockman in the announcement presentation. "We are releasing a new system, which is a remote software agent that can run many tasks in parallel."

How It Works

Using Codex is straightforward. Developers access it through the ChatGPT sidebar and can assign new tasks by typing a prompt and clicking "Code." For questions about the codebase, users click "Ask" instead. Each prompt launches a new Codex agent in a separate environment with full access to the repository.

What happens next is where things get interesting. Codex can navigate complex codebases, run commands including test harnesses and linters, make changes across multiple files, and commit its work—all while documenting each step it takes. Tasks typically take between 1 and 30 minutes depending on complexity, and users can monitor progress in real time.

Once complete, Codex provides verifiable evidence of its actions through citations of terminal logs and test outputs, allowing developers to trace and validate each step. Users can then review the results, request revisions, open a GitHub pull request, or integrate the changes directly into their local environment.

Guiding Your AI Teammate

Like human developers, Codex performs best when given clear guidance. The system can be directed through AGENTS.md files placed within repositories—text files similar to README.md that inform Codex how to navigate the codebase, which commands to run for testing, and how to adhere to project standards.

During the demonstration, Josh from the Codex team showed how they implemented this feature: "We've introduced this concept of an agents.md file. We know for developers it's extremely important to provide steerability and instruction to the model."

Performance benchmarks suggest codex-1 demonstrates strong capabilities even without these custom instructions. On the SWE-Bench Verified evaluation, it achieved 75% accuracy, significantly outperforming previous models like o3-high (70%) and o1-high (11%).

Real-World Applications

Leading up to the release, OpenAI worked with external partners to understand how Codex performs across diverse codebases and development processes.

Cisco has been exploring how Codex can help their engineering teams accelerate development. As early design partners, they're evaluating it for real-world use cases across their product portfolio and providing feedback to shape future iterations.

Temporal found particular value in using Codex to accelerate feature development, debug issues, write and execute tests, and refactor large codebases. Their engineers noted that it helps them stay focused by running complex tasks in the background—maintaining flow while speeding up iteration.

Superhuman highlighted a different benefit: enabling product managers to contribute lightweight code changes without requiring an engineer's time, except for final code review. They've also used it to speed up repetitive tasks like improving test coverage and fixing integration failures.

Kodiak is using Codex to help develop their autonomous driving technology. Engineers there leverage it to write debugging tools, improve test coverage, and refactor code. They've also found it valuable as a reference tool that helps engineers understand unfamiliar parts of the stack by surfacing relevant context and past changes.

Developer Experience

What's it actually like to work with Codex? The OpenAI team shared insights from their own experience during development.

"In the leadup to this launch, I've ended up doing a lot of coordination work. I haven't had as much time for coding as I maybe used to," explained one team member. "Sometimes I'll have an idea of a code change I might like to make, and I'll just kick it off in Codex. It takes like 30 seconds. It's a very lightweight thing to do and then I'll go back to Slack or wherever I am."

The magic, they explained, comes when returning later to find the task completed—sometimes with surprising sophistication. "Sometimes it's a multi-hundred line diff and I open it and start reading through it and I'm like, 'Wow, this actually looks like it's correct.' Maybe the model ran some tests, maybe those tests failed and the model fixed the failure. I look on the left and all the little test results are green."

This leads to what the team described as a transformative moment: "I just landed a non-trivial change in our codebase and that branch never even hit my laptop."

Current Limitations

As with any research preview, Codex has limitations. It currently lacks features like image inputs for frontend work and the ability to course-correct the agent while it's working. Additionally, delegating to a remote agent takes longer than interactive editing, which requires an adjustment in workflow expectations.

Katie from the research team emphasized the importance of verifiability in these systems: "As we move towards this world where AI writes more and more code, this kind of verifiability is going to be really important."

The team recommends assigning well-scoped tasks to multiple agents simultaneously and experimenting with different types of tasks and prompts to explore the model's capabilities effectively.

Availability and Future Plans

Codex is rolling out initially to ChatGPT Pro, Enterprise, and Team users globally, with support for Plus and Edu users coming soon. Users will have generous access at no additional cost during the initial weeks, after which OpenAI will introduce rate-limited access and flexible pricing options.

For developers building with codex-mini-latest, the model is available on the Responses API and priced at $1.50 per 1M input tokens and $6 per 1M output tokens, with a 75% prompt caching discount.

Looking ahead, OpenAI envisions a future where the distinction between real-time pairing (as in tools like Codex CLI) and task delegation (the focus of today's Codex release) eventually converges. The company plans to introduce more interactive and flexible agent workflows where developers can provide guidance mid-task, collaborate on implementation strategies, and receive proactive progress updates.

They also plan deeper integrations across the development ecosystem: today Codex connects with GitHub, but soon users may be able to assign tasks from Codex CLI, ChatGPT Desktop, or even directly from issue trackers or CI systems.

The Changing Landscape of Software Engineering

Codex represents a significant step toward a future where AI doesn't just assist with coding but actively participates in the development process. By handling parallel tasks in independent environments, it offers a glimpse into how software engineering might evolve—where developers focus on high-level direction and creative problem-solving while delegating implementation details to AI agents.

As one team member put it during the presentation:

It's a co-worker, it's an intern that you can delegate to, it's a mentor, it's a pair programmer, and all of these at once.

While optimistic about these productivity gains, OpenAI acknowledges the importance of understanding how widespread agent adoption will affect developer workflows and skill development across experience levels and geographies.