New Google AI System To Soon Identify Individual Voices From A Crowd

| April 13 , 2018 , 12:57 IST

Separating audio like a specific speech from ambient voices or sounds is something humans are good at, their attention is able to mentally tune things to focus from a single speaker but, imagine if a microphone does the same!

Google may soon be possible to pick out individual voices in a crowd by suppressing all other sounds, as the Google researchers are currently working out on a new Artificial Intelligence (AI) system.

"In this work, we are able to computationally produce videos in which speech of specific people is enhanced while all other sounds are suppressed," said Inbar Mosseri and Oran Lang, software engineers at Google Research.

The phenomenon is known as the cocktail party effect, which is the brain's ability to focus auditory attention on a particular stimulus while filtering out a range of other stimuli, as when a partygoer can focus on a single conversation in a noisy room.

The method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear or to have such a person be selected algorithmically based on context. 

The researchers believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, to video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking.

"A unique aspect of our technique is in combining both the auditory and visual signals of an input video to separate the speech," the researchers added.

"Intuitively, movements of a person's mouth, for example, should correlate with the sounds produced as that person is speaking, which in turn can help identify which parts of the audio correspond to that person," they explained.