OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

- Advertisement -

- Advertisement -

Until recently, the practice of building AI agents has been a bit like training a long-distance runner with a thirty-second memory.

Yes, you could give your AI models tools and instructions, but after a few dozen interactions — several laps around the track, to extend our running analogy — it would inevitably lose context and start hallucinating.

With OpenAI's latest updates to its Responses API — the application programming interface that allows developers on OpenAI's platform to access multiple agentic tools like web search and file search with a single call — the company is signaling that the era of the limited agent is waning.

The updates announced today include Server-side Compaction, Hosted Shell Containers, and implementing the new "Skills" standard for agents.

- Advertisement -

With these three major updates, OpenAI is effectively handing agents a permanent desk, a terminal, and a memory that doesn’t fade and should help agents evolve furhter into reliable, long-term digital workers.

Technology: overcoming 'context amnesia'

The most significant technical hurdle for autonomous agents has always been the "clutter" of long-running tasks. Every time an agent calls a tool or runs a script, the conversation history grows.

Eventually, the model hits its token limit, and the developer is forced to truncate the history—often deleting the very "reasoning" the agent needs to finish the job.

OpenAI’s answer is Server-side Compaction. Unlike simple truncation, compaction allows agents to run for hours or even days.

Early data from e-commerce platform Triple Whale suggests this is a breakthrough in stability: their agent, Moby, successfully navigated a session involving 5 million tokens and 150 tool calls without a drop in accuracy.

In practical terms, this means the model can "summarize" its own past actions into a compressed state, keeping the essential context alive while clearing the noise. It transforms the model from a forgetful assistant into a persistent system process.

Managed cloud sandboxes

The introduction of the Shell Tool moves OpenAI into the realm of managed compute. Developers can now opt for container_auto, which provisions an OpenAI-hosted Debian 12 environment.

This isn't just a code interpreter: it gives each agent its own full terminal environment pre-loaded with:

Native execution environments including Python 3.11, Node.js 22, Java 17, Go 1.23, and Ruby 3.1.
Persistent storage via /mnt/data, allowing agents to generate, save, and download artifacts.
Networking capabilities that allow agents to reach out to the internet to install libraries or interact with third-party APIs.

The Hosted Shell and its persistent /mnt/data storage provide a managed environment where agents can perform complex data transformations using Python or Java without requiring the team to build and maintain custom ETL (Extract, Transform, Load) middleware for every AI project.

By leveraging these hosted containers, data engineers can implement high-performance data processing tasks while minimizing the "multiple responsibilities" that come with managing bespoke infrastructure, removing the overhead of building and securing their own sandboxes. OpenAI is essentially saying: “Give us the instructions; we’ll provide the computer.”

OpenAI's Skills vs. Anthropic's Skills

Both OpenAI and Anthropic now support "skills," instructions for agents to run specific operations, and have converged on the same open standard — a SKILL.md (markdown) manifest with YAML frontmatter.

A skill built for either can theoretically be moved to VS Code, Cursor, or any other platform that adopts the specification

Indeed, the hit new open source AI agent OpenClaw adopted this exact SKILL.md manifest and folder-based packaging, allowing it to inherit a wealth of specialized procedural knowledge originally designed for Claude.

This architectural compatibility has fueled a community-driven "skills boom" on platforms like ClawHub, which now hosts over 3,000 community-built extensions ranging from smart home integrations to complex enterprise workflow automations.

This cross-pollination demonstrates that the "Skill" has become a portable, versioned asset rather than a vendor-locked feature. Because OpenClaw supports multiple models — including OpenAI’s GPT-5 series and local Llama instances — developers can now write a skill once and deploy it across a heterogeneous landscape of agents.

But the underlying strategies of OpenAI and Anthropic reveal divergent visions for the future of work.

OpenAI’s approach prioritizes a "programmable substrate" optimized for developer velocity. By bundling the shell, the memory, and the skills into the Responses API, they offer a "turnkey" experience for building complex agents rapidly.

Already, enterprise AI search startup Glean reported a jump in tool accuracy from 73% to 85% by using OpenAI's Skills framework.

By pairing the open standard with its proprietary Responses API, the company provides a high-performance, turnkey substrate.

It isn’t just reading the skill; it is hosting it inside a managed Debian 12 shell, handling the networking policies, and applying server-side compaction to ensure the agent doesn't lose its way during a five-million-token session. This is the "high-performance" choice for engineers who need to deploy long-running, autonomous workers without the overhead of building a bespoke execution environment.

Anthropic, meanwhile, has focused on the "expertise marketplace." Their strength lies in a mature directory of pre-packaged partner playbooks from the likes of Atlassian, Figma, and Stripe.

Implications for enterprise technical decision-makers

For engineers focused on "rapid deployment and fine-tuning," the combination of Server-side Compaction and Skills provides a massive productivity boost

Instead of building custom state management for every agent run, engineers can leverage built-in compaction to handle multi-hour tasks.

Skills allow for "packaged IP," where specific fine-tuning or specialized procedural knowledge can be modularized and reused across different internal projects.

For those tasked with moving AI from a "chat box" into a production-grade workflow—OpenAI’s announcement marks the end of the "bespoke infrastructure" era.

Historically, orchestrating an agent required significant manual scaffolding: developers had to build custom state-management logic to handle long conversations and secure, ephemeral sandboxes to execute code.

The challenge is no longer "How do I give this agent a terminal?" but "Which skills are authorized for which users?" and "How do we audit the artifacts produced in the hosted filesystem?" OpenAI has provided the engine and the chassis; the orchestrator’s job is now to define the rules of the road.

For security operations (SecOps) managers, giving an AI model a shell and network access is a high-stakes evolution. OpenAI’s use of Domain Secrets and Org Allowlists provides a defense-in-depth strategy, ensuring that agents can call APIs without exposing raw credentials to the model's context.

But as agents become easier to deploy via "Skills," SecOps must be vigilant about "malicious skills" that could introduce prompt injection vulnerabilities or unauthorized data exfiltration paths.

How should enterprises decide?

OpenAI is no longer just selling a "brain" (the model); it is selling the "office" (the container), the "memory" (compaction), and the "training manual" (skills). For enterprise leaders, the choice is becoming clear:

Choose OpenAI's Responses API if your agents require heavy-duty, stateful execution. If you need a managed cloud container that can run for hours and handle 5M+ tokens without context degradation, OpenAI’s integrated stack is the "High-Performance OS" for the agentskills.io standard.

Choose Anthropic if your strategy relies on immediate partner connectivity. If your workflow centers on existing, pre-packaged integrations from a wide directory of third-party vendors, Anthropic’s mature ecosystem provides a more "plug-and-play" experience for the same open standard.

Ultimately, this convergence signals that AI has moved out of the "walled garden" era. By standardizing on agentskills.io, the industry is turning "prompt spaghetti" into a shared, versioned, and truly portable architecture for the future of digital work.

Update Feb. 10, 6:52 pm ET: this article has since been updated to correct errors in an earlier version regarding the portability of OpenAI's Skills compared to Anthropic's. We regret the errors.

Source link

- Advertisement -