basics:
- maximum likelihood is a popular optimization criterion
- assume an underlying probability model
- estimate parameters of the model by choosing the distribution which has the highest likelihood of having produced the data

pros:
- efficient in the case where estimation is completely data driven
- efficient iterative estimation algorithms available (EM)

cons:
- how do we choose the form of the models?
- Gaussians typically used since mean and variance easily estimated -- what about higher order moments?
- cannot explicitly incorporate additional knowledge about the data set