Table 1 The number of states used, training and testing probabilities for the best training model using Viterbi algorithm for both training and testing . Table 2 The number of states used, training and testing probabilities for the best testing model using Viterbi algorithm for both training and testing. Table 3 The number of states used, training and testing probabilities for the best training model using Viterbi algorithm for training and Baum Welch for testing. Table 4 The number of states used, training and testing probabilities for the best testing model using Viterbi algorithm for training and Baum Welch for testing. EE 8993: Speech Recognition Homework Assignment #6 Hidden Markov Model July 30, 1998 submitted to: Dr. Joseph Picone Department of Electrical and Computer Engineering 413 Simrall, Hardy Rd. Mississippi State University Box 9 571 MS State, MS 39762 submitted by: Julie Ngan Department of Electrical and Computer Engineering Mississippi State University Box 9571 Mississippi State, Mississippi 39762 Tel: 601-325-8335 Fax: 601-325-3149 email: ngan@isip.msstate.edu I. Problem Definition A sequence of coin toss is observed and the following is given as the data: HHTTTHHHHTTTTTHHHHHHTTTTTTTHHHHHHHHTTTTTTTTTHHHHHHHHTTTTTTTH HHHHHTTTTTHHHHTTTHHTHHTTHHHHHHHHHHTHHHHHTHHHHHHHHHHTHHHTHHHH HHHHHHHHHTHHHHHHHHHTTHHHHHHTHHHTHHHHHTHHHHHHHHTHHHHHHTHTHHHH HHHHHHHTTHHHHHHHHHHHHHHHTHHHHHTTHHHHHHHHHHTTHHHHHHHHHHHHHTTH HHHHHHHHHHHTTHHHTHHHHHHHHHHHTHHHHHHTTHHHHHTHHHTHHHHHTHTHHHHH THTHHHHHHHHHTHTHHHHHHTHTHHHHHTHTTHHHHHHHHHHHHHTHTHHHHHHHHHHH HTHHHHHHHHHHHHHHHTHHHHHHHHHHHHHHHTHHHHHHHHHHTTHHHHHHHTHHHHHH HHHHHTHHHHTHHHHHHHHHHTHHHHHHHHHHHHTTHHHHTTHHHHHHHHHHHHHHTTHH The objective of this experiment is to use the discrete Hidden Markov Model tool to find the best model for the training data. With this model, we are to compute the probabilities of observing the following sequences: HTHTHTHTHTHTHTHTHTHT HHHHHHHHHHHHHHHHHHHH TTTTTTTTTTTTTTTTTTTT Note that the HMM tool allows the user to specify the number of states to use in the model, the number of symbols to use in the model, and the number of passes on the training data to use. Further, the system allows user to select training and testing on Baum Welch or Viterbi algorithm. From the training data, we only observe two different symbols, namely H (head) and T (tail). However, we do not know how many coins were tossed at the same time to generate the training data. Therefore, to find the best training model, we will need to perform experiments on using different number of states, number of passes, and algorithm. II. Hidden Markov Models In this experiment, we are given an observed sequence of data. Using this data, we want to be able to predict the outcome of the model and determine the probability of observing a particular sequence of outcomes. Markov Models are stochastic models used to estimate probabilities for information we observed. A probability is assigned for each output based on what the preceding input sequence is. The number of preceding observations is referred to as the order of the Markov model. A Hidden Markov Model is essentially the same as a simple Markov model. It is characterized by the number of states, the number of distinct distribution, the state transition probability distribution, the output probability distribution, and the initial state distribution [1]. But instead of relying on the previous observations to determine what the output probabilities is, which state the hidden Markov model will use for a given input is not immediately obvious from looking at the sequence, and thus the state is "hidden" in the model. This experiment will make use of the discrete HMM tool to generate models of coin toss sequences and these models are evaluated on a set testing data. The tool assumes the HMM is fully ergodic where all states are fully connected to all other states. Further the initial values of each state are randomly assigned. III. Problem Discussion In our experiments, we have pre-determined that the number of alphabets used are two (H and T). Therefore, in order to find the best model for the testing data, we need to perform an exhaustive search using different algorithms, number of states. The number of states we have used in the experiment ranges from 2 to 64. The reason we picked this range is that for each of the training sequence, we have only 61 observations. Therefore using 61 states will be enough to cover the each of the whole training sequences. Another parameter that can be varied is the number of passes. However, in our experiment, we have kept the number of passes fixed at 10. This can be explained intuitively by the nature of the HMM tool. When the HMM tool is run, it randomly assigns initial values to the states. The number of time the training data is being processed is determined by the number of passes. With each itineraries, the probabilities associated with each state are adjusted to give a better overall probability of the training sequence. As such, after a certain number of passes, the over probability of the training sequence converges to a final value. In our experiment, it is found that independent of the number of states and algorithm, the final value converges mostly in 3-5 passes. Therefore, training with more passes will not produce any better results. As a result, we have fixed the number of passes in this experiment to be 10 to ensure all the final values are converged. One drawback during our experiment is that the HMM tool's training using Baum Welch algorithm is not completely debugged yet. The trained models almost always produced a value that Therefore, the plan for training and testing using both the Baum Welch and the Viterbi algorithms can only be limited to training using Viterbi, and testing using both Viterbi and Baum Welch. IV. Experimental Results We have evaluated the training data using different combinations of number of states and algorithm. To further investigate the best models for the testing data, we have performed exhaustive search as described below: 1. Training and testing using Viterbi algorithm to find the number of states that gives the best probabilities on the training data. Then the models are evaluated on the testing data. The results are shown in Table 1. 2. The data is trained using Viterbi algorithm on 2 to 64 number of states and the best models for the testing data using Viterbi algorithm are found. Note that the model that is best for the testing data might not yield good results for the training data. The results are shown in Table 2. 3. The data is trained using Viterbi algorithm to find the number of states that gives the best probabilities on the training data. Then the models are evaluated using Baum Welch algorithm on the testing data. The results are shown in Table 3. 2. The data is trained using Viterbi algorithm on 2 to 64 number of states and the best models for the testing data using Baum Welch are found. Note that the model that is best for the testing data might not yield good results for the training data. The results are shown in Table 4. V. Conclusion As shown in the experiments, the best model generated from the training data does not necessary gives the best probabilities for the testing model, and vice versa. Looking at the nature of the training and testing data provides some insight to the reason. Firstly, the training data contains mainly of heads, with occasion tail values. Therefore, it is quite probable that the coin used for the training data is biased to a certain degree. Secondly, two of the three testing data consist of pure head or pure tail, which are results from a 100% biased coin toss and the last testing data shows an alternating head/tail sequence, which is the result from a 100% fair coin toss. As a result, it is quite obvious why the training model will not represent the testing data well. VI. References [1] J. Picone, "ECE 8993 Introduction to Speech Processing Lecture Notes," Mississippi State University, Starkville, MS, 1998. Rank Number of states used Training probabilities Testing probabilities 1 3 -97.957870 -20.189884 2 12 -98.083191 -22.700104 3 14 -98.386215 -19.659321 4 26 -98.636002 -19.685520 5 40 -98.926872 -19.721899 Rank Number of states used Training probabilities Testing probabilities 1 2 -99.996658 -14.600151 2 33 -100.267494 -14.650858 3 42 -100.828857 -14.704206 4 61 -101.222977 -14.753456 5 24 -99.128639 -14.927289 Rank Number of states used Training probabilities Testing probabilities 1 3 -97.957870 -28.774155 2 12 -98.083191 -22.914505 3 14 -98.386215 -17.931221 4 26 -98.636002 -13.012259 5 40 -98.926872 -12.663105 Rank Number of states used Training probabilities Testing probabilities 1 55 -99.772392 -9.103511 2 61 -101.222977 -9.226512 3 42 -100.828857 -9.319737 4 58 -102.787437 -9.987990 5 24 -99.128639 -9.662504