Earlier this yr, Google teased a neural network-powered algorithm that would do speech-to-speech translation in real-time. As spectacular as that could be, the app was principally inaccessible to people with speech impediments. This is why the corporate’s AI division has been engaged on new software program particularly geared toward these with verbal impairments.
In a brand new weblog submit, the Big G introduced a brand new speech-to-text know-how particularly designed for people with verbal impairments and atypical speech patterns. Dubbed Parrotron, the software program runs on a deep neural community skilled to transform atypical speech patterns into fluent synthesized speech.
What’s significantly attention-grabbing is that the know-how doesn’t depend on visible cues like lip motion.
To accomplish this, Google fed the neural community with a “corpus of [nearly] 30,000 hours that consists of hundreds of thousands of anonymized utterance pairs.”
The know-how primarily reduces “the word error rate for a deaf speaker from 89 percent to 25 percent,” however Google hopes ongoing analysis will enhance the outcomes even additional.
Indeed, the researchers have additionally shared a few demo movies to indicate the progress they’ve made. You can test them out under:
For a extra detailed breakdown of the know-how behind Parrotron, head to Google‘s weblog right here. The full analysis has been posted on ArXiv, whereas you’ll find extra examples of Parrotron in motion on this GitHub repository.