- optimize for minimum classification error on training data
- classification could be at various levels
- sentence, word and phone classification based loss functions
- gradient descent on the loss function
- Generalized Probabilistic Descent
- core of the MCE algorithm
- guarantees convergence when certain forms of loss functions used
- computationally expensive
- problems similar to MMI estimation with large data
- simplification by using Viterbi alignments and N-best lists
- online and batch modes available to update parameters