OpenAI says it has conducted a small test of its recent voice cloning product, Voice Engine, with a number of select partners. The results show promising applications for the technology, but safety concerns could prevent publication.
OpenAI says Voice Engine can clone a human's voice based on a single 15-second recording of their voice. The tool can then “generate natural-sounding speech that may be very just like the unique speaker.”
Once cloned, Voice Engine can convert text input into audible speech using “emotional and realistic voices.” The power of the tool enables exciting applications, but in addition raises serious security issues.
Promising use cases
OpenAI began testing the voice engine late last yr to see how a small group of select participants could use the technology.
Some examples of how Voice Engine test partners used the product include:
- Adaptive teaching – Age of Learning used Voice Engine to offer reading assistance to children, create voice-over content for learning materials, and supply personalized verbal responses to interact with students.
- Translate content – HeyGen used Voice Engine for video translation in order that product marketing and sales demos could reach a bigger market. The translated audio retains the person's native accent. So when a native French speaker's tone is translated into English, you continue to hear their French accent.
- Providing more comprehensive social services – Dimagi trains medical experts in distant environments. It used Voice Engine to coach medical experts in underserved languages and supply interactive feedback.
- Supporting non-verbal people – Livox enables non-verbal people to speak via alternative communication devices. Voice Engine allows these people to decide on a voice that best represents them, somewhat than something that sounds more robotic.
- Help patients restore their voice – Lifespan piloted a program offering Voice Engine to individuals with speech impairments on account of cancer or neurological diseases.
Voice Engine just isn’t the primary AI voice cloning tool, however the examples in it are Blog post by OpenAI indicate that it represents the state-of-the-art and should even be higher than ElevenLabs.
Here is only one example of the natural tone and emotional qualities that might be created.
OpenAI just launched Voice Engine,
It uses text input and a single 15-second audio sample to supply natural-sounding speech that closely resembles the unique speaker.
Reference and generated audio are very similar and difficult to differentiate.
More details in 🧵 pic.twitter.com/tJRrCO2WZP— AshutoshShrivastava (@ai_for_success) March 29, 2024
Security concerns
OpenAI said it was impressed by the use cases developed by test participants, but further security measures would must be put in place before the corporate decides “whether and tips on how to deploy this technology at scale.”
According to OpenAI, the technology that may accurately reproduce an individual's voice poses “serious risks which are particularly distinguished in an election yr.” Fake Biden robocalls and Senate candidate Kari Lake's fake video are working example.
In addition to the clear restrictions in the overall usage guidelines, participants within the trial were required to have “explicit and informed consent from the unique speaker” and weren’t allowed to develop a product that allowed people to create their very own voices.
OpenAI says it has implemented additional security measures, including an audio watermark. It didn't explain exactly how it really works, but said it could perform “proactive monitoring” of Voice Engine usage.
Some other major players within the AI industry are also concerned about this kind of technology entering the wild.
Voice AI is by far probably the most dangerous modality.
We have minimal defenses against a superhuman, persuasive voice.
Figuring out what to do about it must be one in every of our top priorities.
(We had Sota models but didn't release them because of this, e.g https://t.co/vjY99uCdTl) https://t.co/fKIZrVQCml
– Emad acc/acc (@EMostaque) March 29, 2024
What's next?
Will the remainder of us have the ability to mess around with Voice Engine? It's unlikely, and possibly that's an excellent thing. The potential for malicious use is big.
OpenAI is already recommending that institutions similar to banks phase out voice authentication as a security measure.
Voice Engine has an embedded audio watermark, but OpenAI says more work is required to detect when audiovisual content is AI-generated.
Even if OpenAI decides to not release Voice Engine, others will. The days when you can trust your eyes and ears are over.