OpenAI Agents SDK
Overview
The OpenAI Agents SDK (@openai/agents) is a high-level JavaScript library that simplifies the development of real-time voice applications using OpenAI's API. It abstracts away the complexity of WebRTC, audio handling, and session management while providing a clean interface for building voice-enabled AI agents.
!!! note "Recent Developments" As of October 2025, OpenAI has expanded their platform with new capabilities:
- **Apps SDK**: Framework for embedding applications directly in ChatGPT
- **AgentKit**: Platform for embedding agents in websites
- **GPT Realtime Mini**: 70% lower pricing for real-time applications
- **Enhanced MCP Integration**: Apps SDK built on Model Context Protocol
Key Features
- Simplified Architecture: Reduces voice agent implementation from complex WebRTC setups to just a few lines of code
- Automatic Audio Handling: Built-in microphone access and audio playback management
- Session Management: Handles connection lifecycle, reconnection, and session state
- Security Features: Support for ephemeral tokens for production deployments
- Flexible Configuration: Both client-side and server-side instruction configuration
Installation
Full Package
bun add @openai/agents
Browser-Only Package
bun add @openai/agents-realtime
Basic Implementation
Simple Voice Agent Setup
'use client';
import { useCallback, useState } from 'react';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
export default function VoiceAgent() {
const [isConnected, setIsConnected] = useState(false);
const [isConnecting, setIsConnecting] = useState(false);
const agent = new RealtimeAgent({
name: 'Assistant',
instructions: 'You are a helpful assistant.',
});
const handleConnect = useCallback(async () => {
setIsConnecting(true);
try {
const session = new RealtimeSession(agent);
await session.connect({
apiKey: API_KEY, // Use ephemeral token in production
});
console.log('Connected to voice agent!');
setIsConnected(true);
} catch (error) {
console.error('Connection failed:', error);
} finally {
setIsConnecting(false);
}
}, [agent]);
return (
<div>
<button onClick={handleConnect} disabled={isConnecting || isConnected}>
{isConnecting ? 'Connecting...' : isConnected ? 'Connected' : 'Connect'}
</button>
</div>
);
}
Security Best Practices
Ephemeral Tokens for Production
Instead of using API keys directly in client code, generate ephemeral tokens from your backend:
curl -X POST https://api.openai.com/v1/realtime/client_secrets \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"session": {
"type": "realtime",
"model": "gpt-realtime"
}
}'
Secure Session Configuration
await session.connect({
ephemeralToken: ephemeralToken, // From your secure backend
});
Configuration
Instruction Priority
The SDK follows a server-over-client priority for instructions:
// Client-side instructions (lower priority)
const agent = new RealtimeAgent({
name: 'Assistant',
instructions: 'Always talk in a professional manner.',
});
// Server-side instructions (higher priority - will override client)
const serverConfig = {
model: 'gpt-realtime',
modalities: ['audio', 'text'],
instructions: 'Always talk like a valley girl.',
};
This allows for centralized control over agent behavior while maintaining client flexibility.
Core Classes
RealtimeAgent
Represents the AI agent configuration including name and behavioral instructions.
const agent = new RealtimeAgent({
name: 'MyAssistant',
instructions: 'Custom behavior instructions here',
});
RealtimeSession
Manages the connection and communication session with OpenAI's realtime API.
const session = new RealtimeSession(agent);
await session.connect({ apiKey: 'your-token' });
Phone Integration (SIP Support)
Overview
OpenAI provides SIP (Session Initiation Protocol) integration enabling AI assistants to be accessed via traditional phone calls. This opens up new use cases for hands-free interaction and voice-only AI experiences.
SIP Configuration
OpenAI's realtime API supports direct phone integration through SIP protocol:
# Example SIP endpoint configuration
curl -X POST https://api.openai.com/v1/realtime/sessions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-realtime",
"modalities": ["audio"],
"instructions": "You are a helpful assistant accessible via phone",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16"
}'
Phone-Based Use Cases
- Commute Assistant: Call AI assistant during drives for hands-free task management
- Walking Companion: Voice-only AI interaction during walks or exercise
- Accessibility Tool: Phone-based AI access for users who prefer voice-only interaction
- Task Triggering: Voice activation of cloud-based workflows and automation
- Status Reports: Check project status and receive updates via phone calls
Implementation Benefits
- Universal Access: No app installation required, works with any phone
- Hands-Free Operation: Perfect for situations where visual interfaces aren't practical
- Natural Interaction: Leverages familiar phone call experience
- Infrastructure Integration: Can integrate with existing phone systems and PBX setups
MCP Integration
Overview
OpenAI's Realtime API supports integration with Model Context Protocol (MCP) servers, enabling voice assistants to interact with external services and tools through standardized protocols.
Remote MCP Server Support
Supported Authentication: - Personal Access Tokens (PAT): Direct integration with services like GitHub MCP using API tokens - HTTP MCP Servers: Deployment of MCP servers as HTTP endpoints for voice agent access - Network Isolation: Private deployment within secure network boundaries (e.g., Railway private networks)
OAuth Limitations: OAuth-only MCP servers present integration challenges requiring custom refresh token management solutions.
Voice-Driven Development Workflows
Demonstrated Capabilities: - GitHub Issue Management: Create issues, add comments, and assign users through voice commands - Project Status Queries: Check repository status and receive updates via phone calls - Code Review Triggers: Initiate automated workflows through voice activation - Multi-Service Integration: Access multiple MCP servers through unified voice interface
Implementation Example
// Voice assistant with GitHub MCP integration
const agent = new RealtimeAgent({
name: 'Development Assistant',
instructions: `You are a development assistant with access to GitHub.
You can create issues, add comments, and manage repositories
through voice commands.`,
});
// MCP integration through custom tool adapter
const mcpTools = {
createGitHubIssue: async (title: string, body: string, repo: string) => {
// Integration with GitHub MCP server
return await githubMcp.createIssue({ title, body, repo });
},
addComment: async (issueId: string, comment: string) => {
return await githubMcp.addComment({ issueId, comment });
},
};
// Register tools with realtime session
session.registerTools(mcpTools);
Production Integration Patterns
Private Network Deployment: - Deploy MCP servers as HTTP endpoints within private cloud networks - Configure realtime API to access MCP servers without public authentication - Maintain security through network isolation rather than per-request authentication
Multi-Modal Development: - FaceTime Integration: Call assistants through native calling applications - Phone Number Access: Direct phone integration for hands-free development workflows - Cross-Platform Access: Same MCP capabilities across voice, web, and mobile interfaces
Use Cases with MCP Integration
- Commute Development: Manage development tasks during drives using voice commands
- Hands-Free Code Review: Trigger code reviews and check status while away from keyboard
- Issue Triage: Create and assign GitHub issues through phone conversations
- Status Updates: Receive project updates and metrics via voice interaction
- Workflow Automation: Voice-triggered deployment, testing, and CI/CD operations
Apps SDK & AgentKit
ChatGPT Apps SDK
OpenAI's Apps SDK enables embedding full applications directly within ChatGPT, providing rich UI experiences beyond traditional chat interfaces.
Key Features: - Rich UI Integration: Full support for custom views and interfaces within ChatGPT - Bidirectional Context: Applications can pass context back and forth with ChatGPT - MCP Foundation: Built on Model Context Protocol using resource endpoints to deliver UI - Publishing Platform: Integrated app store experience for discovery and distribution
Implementation:
// Apps SDK leverages MCP resource endpoints
// Returns HTML/UI components directly to ChatGPT
export function createChatGPTApp(mcpServer: MCPServer) {
return {
resource: async (uri: string) => {
// Return UI components as HTML
return {
contents: [
{
type: "text",
text: generateAppHTML(uri)
}
]
};
}
};
}
AgentKit
AgentKit allows embedding OpenAI agents directly into external websites and applications.
Capabilities: - Website Integration: Embed AI agents directly in web properties - Custom Branding: Full control over agent appearance and behavior - Cross-Platform: Works across web, mobile, and desktop environments - Real-time Integration: Combines with realtime API for voice-enabled web agents
MCP Integration Benefits
The Apps SDK's foundation on MCP creates unique opportunities:
Existing MCP Applications: Applications already using MCP (like thumbnail editors) can be embedded in ChatGPT with minimal modifications
Development Workflow: MCP-based development tools can provide rich interfaces within ChatGPT for:
- Code review interfaces
- Project management dashboards
- Visual debugging tools
- Interactive documentation
Publishing and Distribution
Development Mode: Testing and development capabilities before public review/publish
App Store Integration: Native discovery and installation within ChatGPT ecosystem
Enterprise Deployment: Private/internal applications for organizational use
Use Cases
- Voice Assistants: Interactive voice-based AI assistants
- Customer Support: Real-time voice support systems
- Educational Tools: Voice-interactive learning applications
- Accessibility: Voice-controlled interfaces and applications
- Gaming: Voice-driven game interactions and NPCs
- Phone-Based AI: Direct phone call access to AI assistants via SIP integration
- Development Workflows: Voice-activated GitHub issue management and code review triggering
- Hands-Free Automation: Voice-controlled deployment, testing, and CI/CD operations through MCP integration
- Embedded ChatGPT Apps: Rich UI applications within ChatGPT (thumbnail editors, project dashboards, visual tools)
- Website AI Integration: Embedded agents for customer service, product assistance, and interactive experiences
- Cross-Platform Agent Deployment: Single agent codebase deployed across ChatGPT, websites, and voice interfaces
Documentation & Resources
- Official Documentation: https://openai.github.io/openai-agents-js/guides/quickstart/
- Voice Agents Guide: https://openai.github.io/openai-agents-js/guides/voice-agents/quickstart/
- SIP Integration Guide: https://platform.openai.com/docs/guides/realtime-sip
- GitHub Repository: https://github.com/openai/openai-agents-js
- OpenAI Realtime API: https://platform.openai.com/docs/guides/realtime
Advantages Over Lower-Level Implementation
- Reduced Complexity: Eliminates need for manual WebRTC setup
- Built-in Error Handling: Automatic connection management and error recovery
- Audio Management: No need to manually handle microphone access and audio playback
- Session Lifecycle: Automatic handling of connection states and reconnection
- Type Safety: Full TypeScript support with proper type definitions
Known Limitations
!!! warning "SDK Quality Concerns" Community feedback has identified several limitations with the OpenAI Agents SDK:
Documentation Issues: - Poor documentation quality making implementation challenging - Limited examples for real-world use cases - Insufficient guidance for production deployments
Platform Compatibility: - Azure OpenAI API incompatibility due to different API specifications - SDK doesn't work seamlessly across OpenAI and Azure endpoints - Requires custom wrappers to standardize behavior
Control Limitations: - Limited flexibility for advanced use cases - Difficulty achieving fine-grained control over agent behavior - May require abandoning SDK for complex implementations
Pricing Considerations: - GPT Realtime API pricing can be expensive for development and testing - ~$10 for 100K tokens during typical development usage - GPT Realtime Mini offers 70% cost reduction but may have feature limitations
Alternative Approaches
For projects requiring more control or Azure compatibility: - Custom WebRTC implementation with direct API calls - Standardization layer over OpenAI and Azure APIs - Hybrid approach using SDK for simple cases, custom implementation for complex requirements
Related Tools
- AI SDK v5 - For text-based AI applications
- Claude Code - For AI-assisted development workflows