- Why one would use Maximum Entropy formulation:
-
based on the principle of not assuming anything above and
beyond what is provided
-
solutions result in exponential models which are typically
tractable to implement
-
constraints/knowledge can be added explicitly into the
estimate of the probability distribution
-
iterative solutions exist
-
successful application in LMs, parsing, spectral estimation
- Why they are not all pervasive:
-
not all knowledge can be represented as constraints based on
indicator functions
-
iterative solutions converge slowly on large data sets
-
efficient estimation algorithms for maximum likelihood
-
conjecture that constraints can be learned directly from data