A Simple Markov Model For Weather Prediction
What is a first-order Markov chain?

We consider only those processes for which the right-hand side is independent of time:

with the following properties:


The above process can be considered observable because the output process is a set of states at each instant of time, where each state corresponds to an observable event.

Later, we will relax this constraint, and make the output related to the states by a second random process.

Example: A three-state model of the weather

State 1: precipitation (rain, snow, hail, etc.)
State 2: cloudy
State 3: sunny
Basic Calculations
Example:	What is the probability that the weather for eight consecutive days is "sun-sun-sun-rain-rain-sun-cloudy-sun"?

Solution:
	O =	sun	sun	sun	rain	rain	sun	cloudy	sun
		3	3	3	1	1	3	2	3


Example:	Given that the system is in a known state, what is the probability that it stays in that state for d days?
	O =	i	i	i	...	i	j

Note the exponential character of this distribution.
We can compute the expected number of observations in a state given that we started in that state:

Thus, the expected number of consecutive sunny days is (1/(1-0.8)) = 5; the expected number of cloudy days is 2.5, etc.

What have we learned from this example?
Why Are They Called Doubly Stochastic Systems?
The Urn-and-Ball Model
Elements of a Hidden Markov Model (HMM)
·	N - the number of states

·	M - the number of distinct observations per state

·	The state-transition probability distribution 

·	The output probability distribution 

·	The initial state distribution 

We can write this succinctly as: 

Note that the probability of being in any state at any time is completely determined by knowing the initial state and the transition probabilities:


Two basic problems:

(1) how do we train the system?

(2)	how do we estimate the probability of a given sequence (recognition)?

This gives rise to a third problem:

If the states are hidden, how do we know what states were used to generate a given output?

How do we represent continuous distributions (such as feature vectors)?
P(red)	= b1(1)
P(green)	= b1(2)
P(blue)	= b1(3)
P(yellow)	= b1(4)
...
1-Coin Model
(Observable Markov Model)
O	=	H	H	T	T	H	T	H	H	T	T	H ...
S	=	1	1	2	2	1	2	1	1	2	2	1 ...
Why Are They Called "Hidden" Markov Models?
Consider the problem of predicting the outcome of a coin toss experiment. You observe the following sequence:

What is a reasonable model of the system?
2-Coins Model
(Hidden Markov Model)
O	=	H	H	T	T	H	T	H	H	T	T	H ...
S	=	1	1	2	2	1	2	1	1	2	2	1 ...
P(H) = P1	P(H) = P2
P(T) = 1-P1	P(T) = 1-P2
P(H):	P1	P2	P3
P(T):	1-P1	1-P2	1-P3
3-Coins Model
(Hidden Markov Model)
O	=	H	H	T	T	H	T	H	H	T	T	H ...
S	=	3	1	2	3	3	1	1	2	3	1	3 ...
P(red)	= b2(1)
P(green)	= b2(2)
P(blue)	= b2(3)
P(yellow)	= b2(4)
...
P(red)	= b3(1)
P(green)	= b3(2)
P(blue)	= b3(3)
P(yellow)	= b3(4)
...

O = {green, blue, green, yellow, red, ..., blue}

How can we determine the appropriate model for the observation sequence given the system above?