In the late stages of the Cold War, Keith McElveen, an electrical engineer by training, participated in a war crimes investigation. Among the evidence were audio recordings of alleged orders to massacre civilians. However, these recordings featured numerous overlapping voices, rendering them unintelligible. If there had been a technology to extract the desired voice, these recordings could have provided additional influential evidence. However, despite years of effort and millions of dollars spent on every conceivable technological approach, the solution remained elusive.
Keith, by then a widely recognized audio forensics expert, founded Wave Sciences in 2009 to continue his mission to solve the Cocktail Party Problem.
Keith and Leonid, another of the researchers at Wave Sciences, realized that a mathematical technique used in SONAR to locate enemy submarines in 3D provided some clues about how the Cocktail Party Problem might be solved.
This revolutionary approach involved using noisy audio to estimate the solution to what is technically called The Acoustic Wave Equation for Waveguides with Boundary and Initial Conditions. In simpler terms, they used principles of physics and math devised by Isaac Newton and George Green, respectively, to figure out how sound travels from a point in 3D space to each of multiple receivers, be they ears, microphones, or other sensors. These Green's Functions captured not only how the sound traveled directly, but also indirectly via reflections from surfaces (i.e., the boundaries) in the acoustic space (i.e., the waveguide) despite other sounds already being in the space and potentially interfering (i.e., the initial conditions).
But whether this sophisticated approach could eventually bridge the chasm between a theoretical solution and one that works in the real world was the real question, and took nearly a decade to answer. The team faced moments of doubt and nearly abandoned the approach altogether. But they persevered, drawing inspiration from the remarkable ability of human hearing — after all, even a child's normal hearing can effortlessly solve this problem.
The team went into this research fully expecting that a truly general solution to the Cocktail Party Problem would require out-of-the-box thinking, as none of the many approaches tried over the decades had worked. To mimic human hearing, such a solution had to be robust enough to function in a noisy, crowded, dynamic, and reverberant cocktail party even blindfolded (i.e., without a camera), simply by "paying attention" to one sound at a time without knowing the direction or distance to the source, or whether other masking sources were behind it or between the source and the receivers, and regardless of the source being quieter than the background sound level most of the time.
It became apparent early on that if this approach could be made to work, it would accomplish amazing things with even a relatively small number of microphones. The ultimate goal, obviously, would be to use only two, just like human hearing. Using more than two, on the other hand, would produce superhuman results.
Despite many technical challenges and setbacks, the team made slow but steady progress, succeeding first with fully synthesized audio using ideal simulated sensors, then with electronically mixed audio recordings, and finally with fully real sensors, sources, and environments without prior information. And yes, the team eventually demonstrated human levels of performance with only two microphones.
In late 2019, Wave Sciences filed its first patent for what would become known as the GLIMPSE™ engine, adopting human hearing's glimpsing phenomena as its namesake. This breakthrough immediately caught the attention of the United States' FBI, who then conducted extensive testing independently and in coordination with other agencies in North America and Europe to validate our first-of-its-kind solution.
In 2022, Keith experienced deja vu when he was asked to extract a conversation about an alleged crime from recordings with overlapping voices, noise, and reverberation. Unlike his previous war crimes investigation and subsequent experiences, this time there was a tool that could solve the Cocktail Party Problem. The GLIMPSE engine has since been credited with playing a decisive role in this widely-publicized murder-for-hire trial by clarifying previously unintelligible audio recordings collected by the FBI.
Wave Sciences launched Acoustical Focus™, a user-friendly software application that uses GLIMPSE technology to meet the rigorous demands of audio forensic examiners and acoustic analysts in the law enforcement, national security, and military communities.