AI-as-a-Service provider Gathering AI has a brand new speech recognition model called Universal-1. Trained on greater than 12.5 million hours of multilingual audio, the corporate says it does well with speech-to-text accuracy in English, Spanish, French and German languages. It boasts that Universal-1 can reduce hallucinations by 30% in speech data and 90% in ambient noise in comparison with OpenAI's Whisper Large v3 model.

In a blog postthe corporate describes Universal-1 as “one other milestone in our mission to offer accurate, faithful and robust speech-to-text capabilities across multiple languages ​​and to assist our customers and developers worldwide develop various speech AI applications. In addition to a greater understanding of the 4 major languages, the model can code switch and transcribe multiple languages ​​right into a single audio file.

Universal-1 also supports improved timestamp estimation, which is vital when working with audio and video editing and conversation evaluation. Assembly AI claims the brand new model is 13 percent higher than its predecessor, Conformer-2. This leads to higher speaker dialogue, an improved minimum permutation concatenated word error rate (cpWER) of 14%, and speaker count estimation accuracy of 71%.

Finally, parallel inference has been made more efficient, reducing processing time for long audio files. Universal-1 is claimed to finish this task five times faster than Whisper Large-v3. Assembly AI compared the processing speed of Universal-1 with Whisper Large-3 on Nvidia Tesla T4 machines with 16GB VRAM. With a batch size of 64, the previous took 21 seconds to transcribe an hour of audio. However, using a much smaller batch size of 24, the latter took 107 seconds to finish the identical task.

The advantages of improved speech-to-text AI models are that note-takers can create more accurate and hallucination-free notes, discover motion items, and kind metadata reminiscent of proper names, who’s speaking, and time information. Additionally, it supports the creation of tool applications with AI-powered video editing workflows, automated telemedicine platforms for clinical note entry and claims submission processes where accuracy is vital, and more.

