- apply chain rule for partial derivatives
- derivative of loss wrt misclassification measure
- derivative of misclassification measure wrt the discriminant function
- derivative of discriminant function wrt observation probability
- note that the discriminant is probability of all possible state
- path with highest probability can be made to dominate this
measure -- Viterbi approximation
- derivative of observation probability wrt mean