Title: Hybrid HMM/SVM Architectures for Speech Recognition Authors: A. Ganapathiraju, J. Hamaker and J. Picone Speech recognition can be viewed as a pattern recognition problem where we desire each unique sound to be distinguishable from all other sounds. Unfortunately, the measurements we use to classify the signal exhibit extreme amounts of overlap in the feature space. Traditionally statistical models, such as Gaussian mixture models, have been used to "represent" the various modalities for a given speech sound. To improve separability of these representations in the feature space, acoustic units such as context-dependent phones, which use information about the preceding and following sounds, have been employed. Such detailed statistical models often fall prey to the problem of overtraining, and provide only modest gains in recognition performance while significantly increasing the complexity and the parameter count associated with a system. For example, even after training on 100 hours of conversational speech data, it is often observed that mixture components model particular rarely-occurring artifacts corresponding to a single speaker or isolated event. In Hidden Markov Models (HMM), the most successful classification paradigm for speech recognition to-date, parameters are traditionally estimated using a Maximum Likelihood (ML) criterion. Extensions of the HMM paradigm involving discriminative training techniques use techniques such as Maximum Mutual Information (MMI) and Minimum Classification Error (MCE). Many of these techniques fall under the general principle of risk minimization. Empirical risk minimization is one of the most commonly used optimization procedures in machine learning. A Support Vector Machine (SVM) is a new approach to machine learning that learns to classify discriminatively. SVMs are based on the fact that any data can be transformed into a very high dimensional feature space where simple linear hyperplanes can be constructed for classification. Though this task seems daunting (especially when the dimension of the feature space is a few thousand), the theory of kernels gives an elegant solution to this problem and makes it computationally feasible even for large tasks. What is interesting about this approach is that in the process of developing these decision surfaces, one gains great insight into the nature of the overlap between classes, and can identify data that is likely to be outliers rather than at the edges of the region of discrimination. SVMs are a fundamentally new approach to acoustic modeling. Preliminary experiments have been promising. For example, on a phone classification experiment involving the six most confused phone pairs for the OGI Alphadigits, SVMs provided a significant improvement over HMMs. In this paper, we will present results on a recently developed hybrid system involving SVMs and HMMs, and demonstrate that this appears to be a promising approach for acoustic modeling.