Menu :

TL;DR

piper

A fast, local neural text to speech system. Try out and download speech models from https://rhasspy.github.io/piper-samples. More information: https://github.com/rhasspy/piper.

Output a WAV file using a text-to-speech model (assuming a config file at model_path + .json):

echo Thing to say | piper -m path/to/model.onnx -f outputfile.wav

Output a WAV file using a model and specifying its JSON config file:

echo 'Thing to say' | piper -m path/to/model.onnx -c path/to/model.onnx.json -f outputfile.wav

Select a particular speaker in a voice with multiple speakers by specifying the speaker’s ID number:

echo 'Warum?' | piper -m de_DE-thorsten_emotional-medium.onnx --speaker 1 -f angry.wav

Stream the output to the mpv media player:

echo 'Hello world' | piper -m en_GB-northern_english_male-medium.onnx --output-raw -f - | mpv -

Speak twice as fast, with huge gaps between sentences:

echo 'Speaking twice the speed. With added drama!' | piper -m foo.onnx --length_scale 0.5 --sentence_silence 2 -f drama.wav

This document was created using the contents of the tldr project.

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
others