OpenAI didn't release any recent models at its Dev Day event, but recent API features will delight developers who need to use their models to construct powerful apps.
OpenAI has had a difficult few weeks as its CTO Mira Murati and other senior researchers joined the ever-growing list of former employees. The company is under increasing pressure from other flagship models, including open source models, that provide developers cheaper and more powerful options.
New features introduced by OpenAI included real-time API (in beta), image processing fine-tuning, and efficiency-enhancing tools like easy caching and model distillation.
Real-time API
The Real-time API is probably the most exciting recent feature, albeit in beta. It enables developers to create low-latency speech-to-speech experiences of their apps without using separate speech recognition and text-to-speech conversion models.
With this API, developers can now construct apps that enable real-time conversations with AI, equivalent to voice assistants or language learning tools, all through a single API call. It's not quite the seamless experience that GPT-4o's Enhanced Voice Mode offers, however it comes close.
However, at around $0.06 per minute of audio input and $0.24 per minute of audio output, it isn't low-cost.
The recent real-time API from OpenAI is incredible…
Watch as you order 400 strawberries by actually calling the shop with Twillio. Everything with voice. 🍓🎤 pic.twitter.com/J2BBoL9yFv
– Ty (@FieroTy) October 1, 2024
Fine-tuning vision
By fine-tuning image processing inside the API, developers can improve their models' ability to know and interact with images. By fine-tuning GPT-4o using images, developers can create applications that excel at tasks like visual search or object detection.
This feature is already getting used by firms equivalent to Grab, which has improved the accuracy of its mapping service by fine-tuning the model to acknowledge traffic signs from street-level images.
OpenAI also gave an example of how GPT-4o could generate additional content for an internet site after customizing it to stylistically match the positioning's existing content.
Instant caching
To improve cost efficiency, OpenAI introduced Prompt Caching, a tool that reduces the associated fee and latency of commonly used API calls. By reusing recently processed input, developers can reduce costs by 50% and improve response times. This feature is especially useful for applications that require long conversations or repeated context, equivalent to chatbots and customer support tools.
Using cached input could save as much as 50% of input token cost.
Model distillation
Model distillation allows developers to optimize smaller, less expensive models while leveraging the outcomes of larger, more powerful models. This is crucial because distillation previously required multiple independent steps and tools, making it a time-consuming and error-prone process.
Before OpenAI's built-in model distillation feature, developers needed to manually orchestrate various parts of the method, equivalent to: Such as generating data from larger models, preparing fine-tuning datasets, and measuring performance with various tools.
Developers can now routinely save output pairs from larger models like GPT-4o and use these pairs to fine-tune smaller models like GPT-4o-mini. The entire technique of data set creation, fine-tuning and evaluation could be more structured, automated and efficient.
The streamlined developer process, lower latency, and reduced costs make OpenAI's GPT-4o model a lovely prospect for developers seeking to deliver high-performance apps quickly. It shall be interesting to see what applications the multimodal capabilities enable.