OpenAI Agents SDK

Overview

The OpenAI Agents SDK (@openai/agents) is a high-level JavaScript library that simplifies the development of real-time voice applications using OpenAI's API. It abstracts away the complexity of WebRTC, audio handling, and session management while providing a clean interface for building voice-enabled AI agents.

!!! note "Recent Developments" As of October 2025, OpenAI has expanded their platform with new capabilities:

- **Apps SDK**: Framework for embedding applications directly in ChatGPT
- **AgentKit**: Platform for embedding agents in websites
- **GPT Realtime Mini**: 70% lower pricing for real-time applications
- **Enhanced MCP Integration**: Apps SDK built on Model Context Protocol

Key Features

Simplified Architecture: Reduces voice agent implementation from complex WebRTC setups to just a few lines of code
Automatic Audio Handling: Built-in microphone access and audio playback management
Session Management: Handles connection lifecycle, reconnection, and session state
Security Features: Support for ephemeral tokens for production deployments
Flexible Configuration: Both client-side and server-side instruction configuration

Installation

Full Package

bun add @openai/agents

Browser-Only Package

bun add @openai/agents-realtime

Basic Implementation

Simple Voice Agent Setup

'use client';
import { useCallback, useState } from 'react';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

export default function VoiceAgent() {
  const [isConnected, setIsConnected] = useState(false);
  const [isConnecting, setIsConnecting] = useState(false);

  const agent = new RealtimeAgent({
    name: 'Assistant',
    instructions: 'You are a helpful assistant.',
  });

  const handleConnect = useCallback(async () => {
    setIsConnecting(true);
    try {
      const session = new RealtimeSession(agent);

      await session.connect({
        apiKey: API_KEY, // Use ephemeral token in production
      });

      console.log('Connected to voice agent!');
      setIsConnected(true);
    } catch (error) {
      console.error('Connection failed:', error);
    } finally {
      setIsConnecting(false);
    }
  }, [agent]);

  return (
    <div>
      <button onClick={handleConnect} disabled={isConnecting || isConnected}>
        {isConnecting ? 'Connecting...' : isConnected ? 'Connected' : 'Connect'}
      </button>
    </div>
  );
}

Security Best Practices

Ephemeral Tokens for Production

Instead of using API keys directly in client code, generate ephemeral tokens from your backend:

curl -X POST https://api.openai.com/v1/realtime/client_secrets \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session": {
      "type": "realtime",
      "model": "gpt-realtime"
    }
  }'

Secure Session Configuration

await session.connect({
  ephemeralToken: ephemeralToken, // From your secure backend
});

Configuration

Instruction Priority

The SDK follows a server-over-client priority for instructions:

// Client-side instructions (lower priority)
const agent = new RealtimeAgent({
  name: 'Assistant',
  instructions: 'Always talk in a professional manner.',
});

// Server-side instructions (higher priority - will override client)
const serverConfig = {
  model: 'gpt-realtime',
  modalities: ['audio', 'text'],
  instructions: 'Always talk like a valley girl.',
};

This allows for centralized control over agent behavior while maintaining client flexibility.

Core Classes

RealtimeAgent

Represents the AI agent configuration including name and behavioral instructions.

const agent = new RealtimeAgent({
  name: 'MyAssistant',
  instructions: 'Custom behavior instructions here',
});

RealtimeSession

Manages the connection and communication session with OpenAI's realtime API.

const session = new RealtimeSession(agent);
await session.connect({ apiKey: 'your-token' });

Phone Integration (SIP Support)

Overview

OpenAI provides SIP (Session Initiation Protocol) integration enabling AI assistants to be accessed via traditional phone calls. This opens up new use cases for hands-free interaction and voice-only AI experiences.

SIP Configuration

OpenAI's realtime API supports direct phone integration through SIP protocol:

# Example SIP endpoint configuration
curl -X POST https://api.openai.com/v1/realtime/sessions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-realtime",
    "modalities": ["audio"],
    "instructions": "You are a helpful assistant accessible via phone",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16"
  }'

Phone-Based Use Cases

Commute Assistant: Call AI assistant during drives for hands-free task management
Walking Companion: Voice-only AI interaction during walks or exercise
Accessibility Tool: Phone-based AI access for users who prefer voice-only interaction
Task Triggering: Voice activation of cloud-based workflows and automation
Status Reports: Check project status and receive updates via phone calls

Implementation Benefits

Universal Access: No app installation required, works with any phone
Hands-Free Operation: Perfect for situations where visual interfaces aren't practical
Natural Interaction: Leverages familiar phone call experience
Infrastructure Integration: Can integrate with existing phone systems and PBX setups

MCP Integration

Overview

OpenAI's Realtime API supports integration with Model Context Protocol (MCP) servers, enabling voice assistants to interact with external services and tools through standardized protocols.

Remote MCP Server Support

Supported Authentication: - Personal Access Tokens (PAT): Direct integration with services like GitHub MCP using API tokens - HTTP MCP Servers: Deployment of MCP servers as HTTP endpoints for voice agent access - Network Isolation: Private deployment within secure network boundaries (e.g., Railway private networks)

OAuth Limitations: OAuth-only MCP servers present integration challenges requiring custom refresh token management solutions.

Voice-Driven Development Workflows

Demonstrated Capabilities: - GitHub Issue Management: Create issues, add comments, and assign users through voice commands - Project Status Queries: Check repository status and receive updates via phone calls - Code Review Triggers: Initiate automated workflows through voice activation - Multi-Service Integration: Access multiple MCP servers through unified voice interface

Implementation Example

// Voice assistant with GitHub MCP integration
const agent = new RealtimeAgent({
  name: 'Development Assistant',
  instructions: `You are a development assistant with access to GitHub.
                 You can create issues, add comments, and manage repositories
                 through voice commands.`,
});

// MCP integration through custom tool adapter
const mcpTools = {
  createGitHubIssue: async (title: string, body: string, repo: string) => {
    // Integration with GitHub MCP server
    return await githubMcp.createIssue({ title, body, repo });
  },
  addComment: async (issueId: string, comment: string) => {
    return await githubMcp.addComment({ issueId, comment });
  },
};

// Register tools with realtime session
session.registerTools(mcpTools);

Production Integration Patterns

Private Network Deployment: - Deploy MCP servers as HTTP endpoints within private cloud networks - Configure realtime API to access MCP servers without public authentication - Maintain security through network isolation rather than per-request authentication

Multi-Modal Development: - FaceTime Integration: Call assistants through native calling applications - Phone Number Access: Direct phone integration for hands-free development workflows - Cross-Platform Access: Same MCP capabilities across voice, web, and mobile interfaces

Use Cases with MCP Integration

Commute Development: Manage development tasks during drives using voice commands
Hands-Free Code Review: Trigger code reviews and check status while away from keyboard
Issue Triage: Create and assign GitHub issues through phone conversations
Status Updates: Receive project updates and metrics via voice interaction
Workflow Automation: Voice-triggered deployment, testing, and CI/CD operations

Apps SDK & AgentKit

ChatGPT Apps SDK

OpenAI's Apps SDK enables embedding full applications directly within ChatGPT, providing rich UI experiences beyond traditional chat interfaces.

Key Features: - Rich UI Integration: Full support for custom views and interfaces within ChatGPT - Bidirectional Context: Applications can pass context back and forth with ChatGPT - MCP Foundation: Built on Model Context Protocol using resource endpoints to deliver UI - Publishing Platform: Integrated app store experience for discovery and distribution

Implementation:

// Apps SDK leverages MCP resource endpoints
// Returns HTML/UI components directly to ChatGPT
export function createChatGPTApp(mcpServer: MCPServer) {
  return {
    resource: async (uri: string) => {
      // Return UI components as HTML
      return {
        contents: [
          {
            type: "text",
            text: generateAppHTML(uri)
          }
        ]
      };
    }
  };
}

AgentKit

AgentKit allows embedding OpenAI agents directly into external websites and applications.

Capabilities: - Website Integration: Embed AI agents directly in web properties - Custom Branding: Full control over agent appearance and behavior - Cross-Platform: Works across web, mobile, and desktop environments - Real-time Integration: Combines with realtime API for voice-enabled web agents

MCP Integration Benefits

The Apps SDK's foundation on MCP creates unique opportunities:

Existing MCP Applications: Applications already using MCP (like thumbnail editors) can be embedded in ChatGPT with minimal modifications

Development Workflow: MCP-based development tools can provide rich interfaces within ChatGPT for: - Code review interfaces - Project management dashboards
- Visual debugging tools - Interactive documentation

Publishing and Distribution

Development Mode: Testing and development capabilities before public review/publish

App Store Integration: Native discovery and installation within ChatGPT ecosystem

Enterprise Deployment: Private/internal applications for organizational use

Use Cases

Voice Assistants: Interactive voice-based AI assistants
Customer Support: Real-time voice support systems
Educational Tools: Voice-interactive learning applications
Accessibility: Voice-controlled interfaces and applications
Gaming: Voice-driven game interactions and NPCs
Phone-Based AI: Direct phone call access to AI assistants via SIP integration
Development Workflows: Voice-activated GitHub issue management and code review triggering
Hands-Free Automation: Voice-controlled deployment, testing, and CI/CD operations through MCP integration
Embedded ChatGPT Apps: Rich UI applications within ChatGPT (thumbnail editors, project dashboards, visual tools)
Website AI Integration: Embedded agents for customer service, product assistance, and interactive experiences
Cross-Platform Agent Deployment: Single agent codebase deployed across ChatGPT, websites, and voice interfaces

Documentation & Resources

Official Documentation: https://openai.github.io/openai-agents-js/guides/quickstart/
Voice Agents Guide: https://openai.github.io/openai-agents-js/guides/voice-agents/quickstart/
SIP Integration Guide: https://platform.openai.com/docs/guides/realtime-sip
GitHub Repository: https://github.com/openai/openai-agents-js
OpenAI Realtime API: https://platform.openai.com/docs/guides/realtime

Advantages Over Lower-Level Implementation

Reduced Complexity: Eliminates need for manual WebRTC setup
Built-in Error Handling: Automatic connection management and error recovery
Audio Management: No need to manually handle microphone access and audio playback
Session Lifecycle: Automatic handling of connection states and reconnection
Type Safety: Full TypeScript support with proper type definitions

Known Limitations

!!! warning "SDK Quality Concerns" Community feedback has identified several limitations with the OpenAI Agents SDK:

Documentation Issues: - Poor documentation quality making implementation challenging - Limited examples for real-world use cases - Insufficient guidance for production deployments

Platform Compatibility: - Azure OpenAI API incompatibility due to different API specifications - SDK doesn't work seamlessly across OpenAI and Azure endpoints - Requires custom wrappers to standardize behavior

Control Limitations: - Limited flexibility for advanced use cases - Difficulty achieving fine-grained control over agent behavior - May require abandoning SDK for complex implementations

Pricing Considerations: - GPT Realtime API pricing can be expensive for development and testing - ~$10 for 100K tokens during typical development usage - GPT Realtime Mini offers 70% cost reduction but may have feature limitations

Alternative Approaches

For projects requiring more control or Azure compatibility: - Custom WebRTC implementation with direct API calls - Standardization layer over OpenAI and Azure APIs - Hybrid approach using SDK for simple cases, custom implementation for complex requirements

AI SDK v5 - For text-based AI applications
Claude Code - For AI-assisted development workflows

OpenAI Agents SDK

Overview

Key Features

Installation

Full Package

Browser-Only Package

Basic Implementation

Simple Voice Agent Setup

Security Best Practices

Ephemeral Tokens for Production

Secure Session Configuration

Configuration

Instruction Priority

Core Classes

RealtimeAgent

RealtimeSession

Phone Integration (SIP Support)

Overview

SIP Configuration

Phone-Based Use Cases

Implementation Benefits

MCP Integration

Overview

Remote MCP Server Support

Voice-Driven Development Workflows

Implementation Example

Production Integration Patterns

Use Cases with MCP Integration

Apps SDK & AgentKit

ChatGPT Apps SDK

AgentKit

MCP Integration Benefits

Publishing and Distribution

Use Cases

Documentation & Resources

Advantages Over Lower-Level Implementation

Known Limitations

Alternative Approaches

Related Tools