Why one would use Maximum Entropy formulation:
- based on the principle of not assuming anything above and beyond what is provided
- solutions result in exponential models which are typically tractable to implement
- constraints/knowledge can be added explicitly into the estimate of the probability distribution
- iterative solutions exist
- successful application in LMs, parsing, spectral estimation

Why they are not all pervasive:
- not all knowledge can be represented as constraints based on indicator functions
- iterative solutions converge slowly on large data sets
- efficient estimation algorithms for maximum likelihood
- conjecture that constraints can be learned directly from data