How do assistants like Alexa discern sound? The answer lies in two Amazon research papers scheduled to be presented at this year’s International Conference on Acoustics, Speech, and Signal Processing in Aachen, Germany. Ming Sun, a senior speech scientist in the Alexa Speech group, detailed them this morning in a .
“We develop[ed] a way to better characterize media audio by examining longer-duration audio streams versus merely classifying short audio snippets,” he says, “[and] we used semisupervised learning to train a system developed from an external dataset to do audio event detection.”
The first paper addresses the problem of media detection — that is, recognizing when voices captured from an assistant originate from a TV or radio rather than a human speaker. To tackle it,