OpenAI expands Realtime API with latest voices and lowers prices for developers

October 31, 2024

201

OpenAI today updated its real-time API, which is currently in beta. This update adds latest voices to its platform for speech-to-speech applications and reduces the prices related to prompt caching.

Realtime API beta users now have five latest voices with which to construct their applications. OpenAI introduced three of the brand new voices, Ash, Verse and the British-sounding Ballad, in a post on X.

Two real-time API updates:

– You can now create speech-to-speech experiences with five latest voices – which can be rather more expressive and controllable. ???

– We lower the value by utilizing prompt caching. There is a 50% discount on cached text input and a 50% discount on cached audio input… pic.twitter.com/jLzZDBrR7l

— OpenAI developer (@OpenAIDevs) October 30, 2024

The company said in its API documentation that the native speech-to-speech feature “skips an intertext format, meaning low latency and more nuanced output,” while the voices are easier to regulate and more expressive than previous voices.

However, OpenAI warns that it cannot currently offer client-side authentication for the API because it remains to be in beta. It also said that there could also be issues processing real-time audio.

“Network conditions greatly impact real-time audio, and reliably delivering audio from a client to a server at scale is difficult when network conditions are unpredictable,” the corporate said.

OpenAI's history with AI-powered speech and voices is controversial. In March, the corporate released Voice Engine, a rival voice cloning platform ElfLabshowever it limited access to only a number of researchers. After demonstrating its GPT-4o and Voice Mode in May, the corporate paused use of considered one of the voices, Sky, after actress Scarlett Johansson commented on its similarity to her voice.

The company launched ChatGPT Advanced Voice Mode within the US in September for paid subscribers (those using ChatGPT Plus, Enterprise, Teams and Edu).

Speech-to-speech AI would ideally allow firms to create more real-time responses using a voice. Suppose a customer calls an organization's customer support platform. In this case, the speech-to-speech feature can capture the person's voice, understand what they’re asking, and respond with a lower latency AI-generated voice. Speech-to-Speech also allows users to generate voice-overs, where a user speaks their lines, however the voice acting isn’t theirs. One platform that provides that is replica and naturally ElevenLabs.

OpenAI released the Realtime API during its Dev Day this month. The aim of the API is to speed up the event of voice assistants.

Reduce costs

However, using speech-to-speech features may very well be expensive.

When Realtime API was introduced, the pricing structure was $0.06 per minute of audio input and $0.24 per audio output, which isn’t low cost. However, the corporate plans to cut back real-time API prices through quick caching.

Cached text input is reduced by 50% and cached audio input is reduced by 80%.

OpenAI also announced prompt caching during Dev Day and would keep regularly requested contexts and prompts within the model's memory. This reduces the variety of tokens that should be created to generate responses. Lowering input prices could encourage more interested developers to connect with the API.

OpenAI isn't the one company introducing prompt caching. Anthropocene launched prompt caching for Claude 3.5 Sonnet in August.

OpenAI expands Realtime API with latest voices and lowers prices for developers

Reduce costs

LEAVE A REPLY Cancel reply

Must Read

In the geno's update of intuit intuit intuites: Why a right away optimization and intelligent data knowledge for the success of Enterprise Agentic Ai...

According to Nvidia, his Blackwell chips lead benchmarks within the training of LLMS

In the collapse of the UK Tech Unicorn Builder.ai

Building AI Agents for Business: The 3-Circle Rule

Stop guessing why your LLMS break: The latest Anthropic tool shows you exactly what's incorrect

Ada Lovelace Institute’s Gaia Marcus: regulation would increase people’s comfort with AI

The Mistral Ai's recent coding assistant takes the direct goal at Github Copilot

Latest articles

In the geno's update of intuit intuit intuites: Why a right away optimization and intelligent data knowledge for the success of Enterprise Agentic Ai...

According to Nvidia, his Blackwell chips lead benchmarks within the training of LLMS

In the collapse of the UK Tech Unicorn Builder.ai

Our Newsletter

OpenAI expands Realtime API with latest voices and lowers prices for developers

Reduce costs

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter