Can you hear me now? AI Kostik to combat loud audio with generative AI

March 26, 2024

118

Loud recordings of interviews and speeches are the bane of a sound engineer's existence. But a German startup is hoping to unravel this problem with a singular technical approach that uses generative AI to enhance the clarity of voices in videos.

Today, AI acoustics got here out of stealth with funding of 1.9 million euros. According to co-founder and CEO Fabian Seipel, AI-coustics' technology goes beyond standard noise cancellation and works on all devices and speakers.

“Our important mission is to make every digital interaction, be it a conference call, a consumer device or an informal social media video, as clear as a broadcast from knowledgeable studio,” Seipel said in an interview with TechCrunch.

Seipel, a trained audio engineer, founded AI-coustics in 2021 along with Corvin Jaedicke, lecturer in machine learning on the Technical University of Berlin. Seipel and Jaedicke met while studying audio engineering at TU Berlin, where they often encountered poor audio quality in the net courses and tutorials they were required to finish.

“Our personal mission is to beat the pervasive problem of poor audio quality in digital communications,” said Seipel. “While my hearing is barely impaired as a result of music production in my early twenties, I actually have all the time struggled with online content and lectures, which led us to initially take a look at the problem of voice quality and speech intelligibility.”

The marketplace for AI-powered noise cancellation and speech enhancement software is already very robust. AI-coustics' competitors include Insoundz, which uses generative AI to boost streamed and pre-recorded voice clips, and Veed.io, a video editing suite with tools to remove background noise from clips.

However, Seipel says AI-coustics takes a singular approach to developing the AI mechanisms that do the actual noise reduction work.

The startup uses a model trained on voice samples recorded within the startup's studio in Berlin, AI-coustics' hometown. People are paid to take samples – Seipel wouldn't say how much – that are then added to an information set to coach AI-coustics' noise-reducing model.

“We have developed a singular approach to handle audio artifacts and issues – e.g. “Such as noise, reverb, compression, band-limited microphones, distortion, clipping, etc. – through the training process,” said Seipel.

I bet some may have issues with AI-coustics' unique developer compensation system, because the model the startup is constructing could prove to be quite lucrative in the long term. (There is a healthy debate about whether creators of coaching data for AI models deserve residuals for his or her contributions.) But perhaps the larger and more immediate concern is bias.

It is well-known that speech recognition algorithms can develop biases – biases that ultimately harm users. A study A study published within the Proceedings of the National Academy of Sciences showed that speech recognition devices from leading firms are twice as prone to incorrectly transcribe audio from Black speakers than from white speakers.

To counteract this, AI-coustics is specializing in recruiting “diverse” contributors to voice samples, in line with Seipel. He added: “Size and variety are key to eliminating bias and making the technology work across languages, speaker identities, ages, accents and genders.”

It wasn't probably the most scientific test, but I uploaded three video clips – one Interview with an 18th century farmerA Car driving demo and a Protest against the Israeli-Palestinian conflict – to the AI-coustics platform to see how well it really works with each platform. AI-coustics has actually delivered on its promise of improving clarity; In my opinion, the processed clips had far less background noise drowning out the speakers.

Here's the 18th century farmer clip before it:

And then:

Seipel expects AI-coustics' technology for use for each real-time and recorded speech enhancement, and should even be embedded into devices akin to soundbars, smartphones and headphones to routinely improve speech intelligibility. At the moment, AI-coustics offers an online app and API for post-production of audio and video recordings, in addition to an SDK that integrates the AI-coustics platform into existing workflows, apps and hardware.

Seipel says AI-coustics — which makes money through a mixture of subscriptions, on-demand pricing and licensing — currently has five enterprise customers and 20,000 users (though not all paying). The roadmap for the following few months includes expanding the corporate's four-person team and improving the underlying language improvement model.

“Prior to our initial investment, AI-coustics ran a reasonably lean operation with a low burn rate to weather the difficulties of the VC investment market,” Seipel said. “AI-coustics now has an intensive network of investors and mentors in Germany and Great Britain for advice. A robust technology base and the flexibility to focus on different markets using the identical database and core technology gives the corporate flexibility and the chance for smaller pivots.”

Asked if audio mastering technologies like AI-Coustics could steal jobs as some experts fearSeipel noted the potential for AI acoustics to hurry up time-consuming tasks which might be currently the responsibility of human audio engineers.

“A content creation studio or broadcast manager can save money and time by automating parts of the audio production process with AI-Coustics while maintaining the best voice quality,” he said. “Voice quality and intelligibility proceed to be a vexing problem in almost every consumer or skilled device, in addition to within the production or consumption of content. Any application that involves recording, processing or transmitting voice can potentially profit from our technology.”

The financing got here in the shape of an equity and debt capital tranche from Connect Ventures, Inovia Capital, FOV Ventures and Ableton CFO Jan Bohl.

Can you hear me now? AI Kostik to combat loud audio with generative AI

LEAVE A REPLY Cancel reply

Must Read

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Contactless stores set to grow in Europe as Sensei rakes in one other $16 million

AI search start-up Perplexity is targeting an $8 billion valuation in a brand new round of funding

Socket receives recent $40 million to scan software for security vulnerabilities

Cohere adds a vision to its RAG search capabilities

Latest articles

Google releases technology to watermark AI-generated text

Nuclear energy stocks hit record highs on rising demand for AI

The governor of California has blocked groundbreaking AI security laws. This is why it’s such a very important decision for the longer term of...

Our Newsletter

Can you hear me now? AI Kostik to combat loud audio with generative AI

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter