by naggie on 7/3/25, 12:31 PM with 1 comments
by magicalhippo on 7/3/25, 1:01 PM
Author uses Chatterbox TTS' zero-shot voice cloning to generate synthetic training data from a single phrase, Whisper STT to verify the generated voice sample to catch generation errors, and then uses the synthetic data set to fine-tune Piper TTS the standard way.