IFLIR: IMPROVED MOTION DETECTION USING FORDWARD-LOOKING INFRARED
Speech recognition can be viewed as a pattern recognition problem where we desire each unique sound to be distinguishable from all other sounds. Traditionally statistical models, such as Gaussians mixture models, have been built to "represent" the various units of speech. However the lack of knowledge about the true underlying distribution has forced us to look at alternate techniques that focus on "discrimination" instead of "representation".

Hidden Markov Models (HMM) have been the most successful classification paradigm for speech recognition. Traditionally the model parameters are estimated using the Maximum Likelihood (ML) criterion. Likewise, estimation techniques like Maximum Mutual Information (MMI) and Minimum Classification Error (MCE) have been developed for discriminative estimation of the model parameters. The effort in estimating parameters using the discriminative techniques is significantly greater than ML estimation. There are other classifiers like neural networks whose parameters are estimated discriminatively. However these systems cannot be easily used directly to model the dynamic nature of speech. In such cases hybrid systems are used.

Support Vector Machines (SVM) is a new class of machine learning technique that learns to classify discriminatively. This paradigm has gained significance in the past few years with the development of efficient training algorithms. SVMs are based on the fact that any data can be transformed into a very high dimensional feature space where it can be classified using a simple linear hyperplane. Though this task seems daunting (especially when the dimension of the feature space is a few thousand), the theory of kernels gives an elegant solution to this problem and makes it computationally feasible even for large tasks. Like neural network techniques, SVMs are implicitly static classifiers. One would need to handle the dynamic nature of data using a hybrid method built on a dynamic model like HMMs.

Preliminary experiments on classifying speech data at the frame and phone level have been very encouraging. SVMs outperform most other non-linear classifiers including neural networks and Bayes classifiers. Another interesting fallout of this work is the ability of the SVMs to identify mislabeled training data. This is an important feature since it provides us with a nice way of handling inaccurate training data.

In the proposed research, I will develop a hybrid HMM/SVM system to recognize conversational speech. HMMs give us an elegant method to handle the dynamics of speech and SVMs provide us with powerful classifiers of static data. SVMs will be used to generate the probability of the data given the model which will then be processed using a dynamic programming approach commonly employed for HMMs. Another approach that will be pursued is the use of Fisher kernels.