• discriminative techniques
    • ML is not the most appropriate for HMM estimation
    • need to incorporate negative examples into optimization
  • maximum mutual information
    • motivated from the definition of mutual information and conditional entropy
    • incorporates negative examples via a likelihood measure
    • difficult to go beyond applications with constrained grammars
    • N-best paradigm is an alternative
  • minimum classification error
    • motivated by the theory of GPD
    • optimization via a set of loss functions
    • loss defined in terms of expected misclassification
    • can be applied to string and phone level
    • computationally expensive for large vocabulary applications