by hn17 on 3/25/18, 10:43 AM with 20 comments
by kastnerkyle on 3/25/18, 3:28 PM
Large company APIs will usually be better at generic speaker, generic language recognition - but if you can do speaker adaptation and customize the language model, there are some insane gains possible since you prune out a lot of uncertainty and complexity.
If you are more interested in recognition and alignment to a script, "gentle" is great [2][3]. The guts also have raw Kaldi recognition, which is pretty good for a generic speech recognizer but you would need to do some coding to pull out that part on its own.
For a decent performing deep model, check into Mozilla's version of Baidu's DeepSpeech [4].
If doing full-on development, my colleague has been using a bridge between PyTorch (for training) and Kaldi (to use their decoders) to good success [5].
[0] how I use pocketsphinx to get phonemes, https://github.com/kastnerkyle/ez-phones
[1] https://github.com/cmusphinx/pocketsphinx-python
[2] https://github.com/lowerquality/gentle
[3] how I use gentle for foreced alignment, https://github.com/kastnerkyle/raw_voice_cleanup/tree/master...
by capo64 on 3/25/18, 1:32 PM
These models are way easier to train, have surprisingly good accuracy, and are robust to noise.
by nshm on 3/25/18, 9:41 PM
by vram22 on 3/25/18, 6:24 PM
Speech recognition with the Python "speech" module:
https://jugad2.blogspot.in/2014/03/speech-recognition-with-p...
Speech synthesis in Python with pyttsx:
https://jugad2.blogspot.in/2014/03/speech-synthesis-in-pytho...
Check out the synthetic voice announcing an arriving train in Sweden (near top of 2nd post above).
by payne92 on 3/25/18, 2:42 PM
This not correct. Most modern speech recognition systems are based on deep neural nets (DNN).