• steepest descent
    • parameter values updated in direction of maximum descent
    • line search for the step size
    • impractical for complex problems
  • momentum
    • solves problem of slow-convergence with steepest descent
    • relative contribution of past and current values controlled
  • local optimization
    • individual learning rates employed for each parameter
    • learning rates varied with time
    • not true gradient descent anymore
  • RProp and QuickProp
    • individual learning rates employed for each parameter
    • parameter updates done by using sign of derivative alone
    • learning rates incremented and decremented exponentially