Siri Now Smoother, Perkier (Thanks, Deep Learning!)
Apple has been working hard on Siri's voice, using on-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis.
Siri is a personal assistant that communicates using speech synthesis. Starting in iOS 10 and continuing with new features in iOS 11, we base Siri voices on deep learning. The resulting voices are more natural, smoother, and allow Siri’s personality to shine through.
There are essentially two speech synthesis techniques used in the industry: unit selection and parametric synthesis. Unit selection synthesis provides the highest quality given a sufficient amount of high-quality speech recordings, and thus it is the most widely used speech synthesis technique in commercial products. On the other hand, parametric synthesis provides highly intelligible and fluent speech, but suffers from lower overall quality.
In order to provide the best possible quality for Siri’s voices across all platforms, Apple is now taking a step forward to utilize deep learning in an on-device hybrid unit selection system.
I thought it would be fun to compare Siri in different voices with this video, which imagines the progression in the quality of the voice of the HAL 9000 from the film 2010 (which came out in 1984.) Start at about 1 minute in if you wish. In this scene, Dr. Chandra is reactivating the HAL computer, and starts with the most basic voice "module", progressing to higher levels of speech.