HomeEthics & SocietyAI headphones let users concentrate on a single voice in noisy environments

AI headphones let users concentrate on a single voice in noisy environments

Researchers on the University of Washington have developed an AI system that enables noise-canceling headphones to isolate and amplify a single voice in a crowded, noisy environment. 

The technology, called Target Speech Hearing (TSH), enables users to pick a particular person to take heed to by simply taking a look at them for a number of seconds.

The TSH system addresses a typical challenge faced by noise-canceling headphones: while they effectively reduce ambient noise, they achieve this indiscriminately, making it difficult for users to listen to specific sounds they may need to concentrate on. 

As Shyam Gollakota, a professor on the University of Washington and the project’s leader researcher, explains, “Listening to specific people is such a fundamental aspect of how we communicate and the way we interact with other humans. But it will probably get really difficult, even when you don’t have any hearing loss issues, to concentrate on specific people with regards to noisy situations.”

How it really works

The study smartly combines noise-canceling headphones and AI to home in on individual voices in loud and crowded settings. 

  1. During the “enrollment” phase, the user looks on the goal speaker for a number of seconds, allowing the binaural microphones on the headphones to capture an audio sample containing the speaker’s vocal characteristics, even within the presence of other speakers and noises.
  2. The captured binaural signal is processed by a neural network that learns the characteristics of the goal speaker, separating their voice from interfering speakers using directional information.
  3. The learned characteristics of the goal speaker, represented as an embedding vector, are then input into a unique neural network designed to extract the goal speech from a cacophony of speakers.
  4. Once the goal speaker’s characteristics have been learned in the course of the enrollment phase, the user can look in any direction, move their head, or walk around while still hearing the goal speaker.
  5. The TSH system repeatedly processes the incoming audio, using the learned speaker embedding to isolate and amplify the goal speaker’s voice while suppressing other voices and background noise.

The current prototype can only effectively enroll a targeted speaker whose voice is the loudest in a selected direction, however the team is working on improving the system to handle more complex scenarios with diverse, varied audio sources.

Samuele Cornell, a Carnegie Mellon University’s Language Technologies Institute researcher, praises the research for its clear real-world applications, stating, “I believe it’s a step in the precise direction. It’s a breath of fresh air.”

While the TSH system is currently a proof of concept, the researchers are in talks to embed the technology in popular brands of noise-canceling earbuds and make it available for hearing aids. 

Together with improved audio and speech evaluation, which leaped forward with GPT-4o, those with each visual and auditory impairments will have the opportunity to raised hook up with the sensory world around them.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read