• optimize for minimum classification error on training data
    • classification could be at various levels
    • sentence, word and phone classification based loss functions
    • gradient descent on the loss function

  • Generalized Probabilistic Descent
    • core of the MCE algorithm
    • guarantees convergence when certain forms of loss functions used

  • computationally expensive
    • problems similar to MMI estimation with large data
    • simplification by using Viterbi alignments and N-best lists
    • online and batch modes available to update parameters