WellSaid Labs, a number one artificial intelligence (AI) voice company, today introduced recent technology that enables users to regulate the performance of AI voices in a more natural and nuanced way. The technology, called HINTS (Highly Intuitive Naturally Tailored Speech) allows content creators to shape AI voices by adding contextual annotations corresponding to tempo or volume adjustments, identical to a movie director.

“We have long heard from our customers that they would love to have more influence in shaping the speech output of our AI,” said Michael Petrochuk, co-founder and CTO of WellSaid Labs, in an exclusive interview with VentureBeat. “We desired to develop a system that was intuitive and natural, allowing our model to predict natural performance based on users’ production context, so creatives can higher realize their artistic vision.”

Meeting creative needs with AI

Unlike current methods of controlling AI voices through rigid markup languages ​​or prompts, HINTS enables fine-grained and interpolable adjustments. For example, users could make a selected passage exactly 0.7 times slower or 5 dB louder, with the AI ​​voice responding naturally. Context awareness implies that annotations might be nested and layered across long scripts.

“Because it uses actual (consensual) human data to create its final audio outputs, its annotated verbalizations are only as 'realistic' as unannotated outputs,” Petrochuk told VentureBeat. “Interestingly, on this research we found that the model shouldn’t be only capable of effectively model a single data set, but that it will probably be even further generalized and use multiple speakers' performances to influence the usage of prosody . We were speechless after we first heard this and it clearly illustrates what further research will bring.”

Expanding creative possibilities

HINTS fills a long-standing need for more customizable and director-focused AI voice tools. The recent architecture could open up creative opportunities for voice-based content in audiobooks, training narratives, marketing videos and more. Early evaluation shows improvements in accuracy and naturalness.

The research also emphasizes responsible and ethical AI practices. “We have been committed to moral innovation from the start,” Petrochuk said. WellSaid obtains express consent from voters, protects privacy, and moderates content to stop abuse or deception.

As vocal AI becomes increasingly integrated into consumer technology and entertainment, HINTS shows how the technology can turn out to be an empathetic storytelling medium, not only a voice machine. While there are still limitations in comparison with working with human talent, tools like HINTS bring us one step closer to actually expressive synthetic voices.


