Holly+ Processes Your Singing In Holly Herndon's Voice
Holly+ uses a machine learning model of a sound to achieve timbre transfer. In other words, any singing can be modified to be as perfect as a professional artist.
The first tool, a custom voice instrument and website by Never Before Heard Sounds, allows for anyone to upload polyphonic audio and receive a download of that music sung back in my distinctive processed voice...
A Voice Model is a deep neural network that can generate raw audio of an individual voice. The network is trained on recorded speech and singing from the target voice, and can be interacted with in various ways, from text-to-speech applications to more complex interactions such as audio style transfer, where audio from one voice can be converted to resemble the target voice, a kind of vocal puppetry 🤖
The recent introduction of projects like DeepMind’s Wavenet, Google’s Tacotron and others have advanced the field of voice generation sufficient to make me confident that generating convincing spoken and sung voices will soon become standard practice for artists and other creatives, as presaged by the popularity of celebrity vocal deep fakes already found all over YouTube.
The general idea of perfecting a voice using electronic techniques is at least as old as the short story Prima Donna 1980 by Bernard Brown, published by Amazing Stories in 1931:
“The perfect voice is the voice of science. There is no human variable. All voices are equally perfect. Put your guards in there, Schonberg; let them sing! Yes, even you, Schonberg, can be a Caruso!"
(Read more about perfect voice modulation)