- discriminative techniques
- ML is not the most appropriate for HMM estimation
- need to incorporate negative examples into optimization
- maximum mutual information
- motivated from the definition of mutual information and
conditional entropy
- incorporates negative examples via a likelihood measure
- difficult to go beyond applications with constrained grammars
- N-best paradigm is an alternative
- minimum classification error
- motivated by the theory of GPD
- optimization via a set of loss functions
- loss defined in terms of expected misclassification
- can be applied to string and phone level
- computationally expensive for large vocabulary applications