HomeArtificial IntelligenceOpenAI introduces AI model for cloning voices, but (for now) just for...

OpenAI introduces AI model for cloning voices, but (for now) just for chosen partners

ChatGPT-Maker will not be content to easily disrupt text creation, images and videos with its various AI models OpenAI also deals with the last major type of older digital media: audio. Particularly voice cloning.

The company is today pronounces its latest AI model “Voice Engine”. which has reportedly been in development since 2022 and is currently powering OpenAIs Text-to-Speech API and the brand new ChatGPT Voice and Read Aloud Features introduced earlier this month.

As it seems, the model can even perform voice cloning. Here's how it really works: A human speaker records a 15-second clip of their voice through a phone or computer microphone, and OpenAI's voice engine generates “natural-sounding speech that closely resembles the unique speaker” to be used any further. to talk out loud any text a human user types.

Huge impact on the spoken audio market

Technology is clearly having a huge effect on those that regularly record themselves speaking, be they podcasters, voice actors, spoken word performers, audiobook and industrial narrators, gamers, streamers, customer support representatives, salespeople, and plenty of other professions and disciplines.

It also puts pressure on other firms dedicated to this sort of technology, corresponding to well-funded AI startup ElevenLabs, Captions, Meta, WellSaid Labs, MyShell and others.

OpenAI also highlights the voice engine's ability to support non-verbal individuals, giving them unique, non-robotic voices and assisting in therapeutic and academic programs for individuals with speech disabilities or learning needs.

First use cases

OpenAI said in its blog post announcing Voice Engine today that it has only made the technology available to a “small group of trusted partners.” People highlighted and named include:

  1. Age of learningan education technology company that uses Voice Engine and GPT-4 to generate pre-written and real-time personalized voice content, expanding reading support and interactivity to a various student audience.
  2. Hello Genan AI visual storytelling platform that allows creators and businesses to translate their content into multiple languages, leverages Voice Engine for video translation and creates custom human-like avatars with multilingual voices while preserving the unique speaker's accent for global to achieve audiences.
  3. Braina software company that makes tools for community medical examiners is using Voice Engine and GPT-4 to offer those employees with interactive feedback in multiple languages ​​to enhance the delivery of essential services in distant environments.
  4. Livoxan AI app for Augmentative and Alternative Communication (AAC) devices utilized by individuals with speech and hearing difficulties integrates Voice Engine to offer unique, non-robotic voices to nonverbal people in any language.
  5. The Norman Prince Neurosciences Institute at Lifespan, a nonprofit medical and teaching organization at Brown University dedicated to supporting individuals with neurological diseases and disorders, is using Voice Engine to assist individuals with speech disabilities use the AI ​​version of their voice. Two doctors there, Rohaid Ali and pediatric neurosurgeon Konstantina Svokos, have already successfully restored the speech of a brain tumor patient using an audio sample from one in all their school project videos.

The company uploaded several audio samples to its blog and emailed them to VentureBeat under embargo that show the technology's human-like speaking abilities. For example, here is the unique patient “source voice” from Lifespan:

And here is the cloned voice using OpenAI Voice Engine:

Limited user base resulting from design

But in the meanwhile the technology is restricted. As with its powerful, incredibly realistic and vivid video generation AI model Sora, OpenAI is currently enabling the general public to make use of the voice engine. Instead, OpenAI today simply shares the tool's existence and “preliminary insights and results from a small preview” with “a small group of trusted partners” who’ve been granted access.

As OpenAI notes in its blog post announcing the technology today:

Voice Engine's cautious, slow and regular approach to releasing Voice Engine makes particular sense given U.S. President Joseph R. Biden's recent call to “ban AI voice impersonation.”

At the center of OpenAI's deployment strategy is strict adherence to security and ethical guidelines. Partners involved in testing Voice Engine are sure by usage guidelines that prohibit unauthorized impersonation and require informed consent from voice donors.

Additionally, OpenAI has implemented security measures corresponding to watermarking and proactive monitoring to make sure responsible use of the technology.


Please enter your comment!
Please enter your name here

Must Read