For these tutorials it is assumed that you have some very basic knowledge of Unix. If you are new to Unix, a useful tutorial can be found by searching for Unix For Poets although there are countless resources available on the web. There are several Unix operating systems that are available, and several options for making a virtual machine on your current OS rather than dual-booting. For these tutorials we used Virtual Box to create our virtual machine and installed Ubuntu as the virtual machine's operating system. Finally, the speech recognizer, HTK, was installed on this machine.
HTK is an open source speech recognition system. Before beginning any of the tutorials you need to register with HTK and then download the software here.
You should also download the HTK Book which has instructions for installation, software documentation, and additional resources for running a speech recognition experiment.
TI Digits Short: Beginner
The TI Digits short data set is a subset of the full TI Digits data set. Short sentences consisting of the words "oh", "zero", "one", "two", "three" , "four", "five", "six", "seven", "eight", or "nine" are spoken by both men and women and are recorded in a relatively noise-free environment. Since the number of words is so limited and the probability of each individual word being spoken is independent of the previously spoken words, we can basically ignore the language model. For this reason, and because this is a small data set, this is a good tutorial to begin with if you are new to speech recognition.
In this tutorial we'll cover the basic steps to prepare the data, train
monophone and word-internal triphone models, and finally how to decode