HomeArtificial IntelligenceEmotive Voice Ai Startup Hume starts a brand new EVI 3 model...

Emotive Voice Ai Startup Hume starts a brand new EVI 3 model with faster custom language creation

New York based Ki -Startup Hume presented its latest Empathic Voice Interface (EVI) -KI modelEVI 3 (pronounced “Evee” three just like the Pokémon character), which goals to play every thing from the performance of customer support systems to health coaching to the immersion of stories and virtual camaraderie.

With EVI 3, users can create their very own voices by talking to the model (it’s voice-to-voice/speech-to-speech) and goals to find out a brand new standard for naturalness, expressiveness and “empathy”.

EVI 3 was developed and expanded for corporations, developers and creators and expanded the previous language models of Hume by offering more demanding adjustments, faster reactions and improved emotional understanding.

Individual users can interact with it today Humes Live demo on his website and iOS -app, but developer access via Humes Proprietary application programming interface (API) is to be made available in the approaching weeks Blog post from the corporate States.

At this point, developers EVI 3 can embed in their very own customer support systems, creative projects or virtual assistants – at a price (see below).

My own use of the demo enabled me to create a brand new, custom synthetic voice in a matter of seconds, based on the properties I actually have described – a mix of warm and assured and a male tone. The conversation with it felt more naturalistic and easier than other AI models and positively the stock votes from Legacy -tech -Ladders like Apple with Siri and Amazon with Alexa.

WHFor developers and corporations, it is best to find out about EVI 3

Hume's EVI 3 is developed for a variety of uses-from Customer Service to in-app interactions to the creation of content in audio books and games.

It enables users to specify precise personality traits, vocal qualities, emotional sound and conversation topics.

This signifies that it might produce every thing, from a warm, sensitive guide to a unusual, mischievous narrator – to inquiries equivalent to “a squeaky mouse whispers in a French accent over the scheme to steal cheese from the kitchen”.

The core strength of EVI 3 lies in its ability to integrate emotional intelligence directly into language -based experiences.

In contrast to standard chatbots or voice assistants, that are strongly depending on script or text-based interactions, EVI 3 matches in the best way people speak in a natural way, the pitch, prosody, breaks and vocal to perform more appealing, humanly similar conversations.

A big feature feature of Humes Models is currently missing – and that of open source and proprietary, equivalent to Elfflabs – the cloning of rivals and the fast replication of a user or one other voice, equivalent to. B. an organization CEO.

But Hume stated that it gives such a ability to its octav-text-to-language model, because it is set on the Hume website as “soon within the human”, and it’s enabled that users can replicate voices from only five seconds.

Hume explained that these are priorities and ethical considerations before these functions are largely available. This cloning ability is currently not available in EVI itself, whereby Hume emphasizes flexible language adjustment as a substitute.

Internal benchmarks show that users EVI 3 prefer to open the GPT 4O language model from Openai

According to Humes own tests with 1,720 users, EVI 3 was preferred in every certified category in comparison with Openais GPT-4O: naturalness, expressiveness, empathy, interruption processing, response speed, audio quality, language motion/style modulation on request and on request (on request “. Features are referred to in” Instructions down “.

It normally also defeated Google's Gemini Model Family and the brand new open source KI model company Sesam From the previous Oculus co-creator Brendan Irib.

It also has a lower latency (~ 300 milliseconds), robust multilingual support (English and Spanish, with more languages ​​coming) and effectively unlimited custom voices. As Hume writes on his website (see screenshot right below):

The most vital functions include:

  • ProsoDe ceremony and expressive text-on language with modulation.
  • InterruptatabilityActivate dynamic conversation flow.
  • In-Converse voice adaptabilityTherefore, users can adjust the speech style in real time.
  • API-capable architecture (Shortly) developers EVI 3 can integrate directly into apps and services.

Prince design and developer access

Hume offers flexible, usage-based pricing for EVI, Octave TTS and Expression Measure APIs.

While the precise API price design of EVI 3 has not yet been announced (marked as TBA), the pattern indicates that it is predicated on usage-based corporations, whereby corporate discounts can be found for giant deployments.

As a reference, EVI 2 costs a price of $ 0.072 per minute – 30% lower than its predecessor EVI 1 ($ 0.102/minute).

For creators and developers who work with text-to-speech projects, the Octave TTS plans from Hume range from a free level (10,000 characters of the language, ~ 10 minutes of audio) to plans at the corporate level. Here is the collapse:

  • Free: 10,000 characters, unlimited custom votes, $ 0/month
  • starter: 30,000 characters (~ half-hour), 20 projects, 3 USD/month
  • Creator: 100,000 characters (~ 100 minutes), 1,000 projects, usage-based surplus (0.20 USD/1,000 characters), $ 10/month
  • Per: 500,000 characters (~ 500 minutes), 3,000 projects, 0.15 USD/1,000 additional, $ 50/month
  • scale: 2,000,000 characters (~ 2,000 minutes), 10,000 projects, 0.13 USD/1,000 extra, $ 150/month
  • Business: 10,000,000 characters (~ 10,000 minutes), 20,000 projects, 0.10 USD/1,000 additional, 900 USD/month
  • Pursue: Custom pricing and unlimited use

For developers who work on real-time language interactions or emotional analyzes, Hume also offers a payment, while with $ 20 in free credits and no preliminary commitment. Customers with a high volume can go for a dedicated company plan with data record licenses, local solutions, custom integrations and prolonged support.

Humes history of emotional AI language models

Hume was founded in 2021 by Alan Cowen, a former researcher on Google Deepmind, and goals to shut the gap between human emotional nuance and AI interaction.

The company trained its models in an expansive data set that was drawn by a whole lot of hundreds of participants worldwide.

“Emotional intelligence includes the power to shut intentions and preferences from behavior. That is the core of what AI interfaces want to attain,” Cowen told Venturebeat. Humes mission is to make AI interfaces response fast, human and ultimately more useful – whether a customer helps to navigate an app or to inform a story with exactly the appropriate mixture of drama and humor.

At the start of 2024, the EVI 2 company began, which, along with latest functions equivalent to Dynamic Voice Customization and Inconversation Style input requests, offered a lower latency and 30% reduced prices in comparison with EVI 1.

February 2025 saw Octave's debut, a text-to-language engine for content manufacturers who can adapt emotions on the set level with text requirements.

With EVI 3, which are actually available for practical explorations and the entire API access across the corner, Hume, developers and creators hope to think about what is feasible with Voice Ai.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read