• apply chain rule for partial derivatives

  • derivative of loss wrt misclassification measure

  • derivative of misclassification measure wrt the discriminant function

  • derivative of discriminant function wrt observation probability

    • note that the discriminant is probability of all possible state sequences
    • path with highest probability can be made to dominate this measure -- Viterbi approximation


  • derivative of observation probability wrt mean