Audio samples generated by the code in the syang1993/gst-tacotron repo, which is a Tensorflow implementation of the Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis and Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

1. All the horses would come to him , but I think we were his favorites.

With GST (100K step)With GST (200K step) Without GST (150K step)reference audio





2. The oldest of the colts raised his head , pricked his ears , and said , There are the hounds.

With GST (100K step)With GST (200K step) Without GST (150K step)reference audio


3. He was gone again, glad to get away even from Fanny.

With GST (100K step)With GST (200K step) Without GST (150K step)reference audio