- steepest descent
- parameter values updated in direction of maximum descent
- line search for the step size
- impractical for complex problems
- momentum
- solves problem of slow-convergence with steepest descent
- relative contribution of past and current values controlled
- local optimization
- individual learning rates employed for each parameter
- learning rates varied with time
- not true gradient descent anymore
- RProp and QuickProp
- individual learning rates employed for each parameter
- parameter updates done by using sign of derivative alone
- learning rates incremented and decremented exponentially