Sound demos for "WaveFlow: A Compact Flow-based Model for Raw Audio"

Audio synthesis conditioned on mel spectrogram

WaveFlow (64-layer, res. channels = 64)	WaveGlow (96-layer, res. channels = 64)	ClariNet (60-layer, res. channels = 64)

WaveFlow (64-layer, res. channels = 128)	WaveGlow (96-layer, res. channels = 128)	WaveNet (30-layer, res. channels = 128)

WaveFlow (64-layer, res. channels = 256)	WaveGlow (96-layer, res. channels = 256)	Ground-truth (recorded speech)

Text-to-speech synthesis

The rainbow passage: When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow. The rainbow is a division of white light into many beautiful colors. These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at one end. People look, but no one ever finds it.

Deep Voice 3 + WaveFlow	Deep Voice 3 + WaveGlow	Deep Voice 3 + WaveNet	Recorded human speech (reference only)