# Markov Chains¶

When successive observations are not independent of each other, we need to find a way of describing the dependence. We’ll start with the simplest of cases, when there are two possible states, if we were studying the weather, this would be wet or dry for instance. We can model the weather with a two state Markov chain, meaning that knowing the weather today will tell us what the probability of the different states tomorrow are, and knowing yesterday wouldn’t add to that knowledge. We call the conditional probability \(P(\mbox{Wet tomorrow} \mid \mbox{Wet today})\) the transition probability of the Markov chain and we put it in the \((W,W)\) entry of the transition matrix. The first entry in the first position is the probability that the tomorrow is Dry given that today is Dry, this is \(3/5\).

Product of two markov chain matrices:

Notice that the probability \(P(X_2=W\mid X_0=W)\) is exactly the formula for the \(({\it W} ,{\it W})\) entry in \({\bf P}{\bf P}\)

In more general cases we increase the number of possible states, but as it stays a finite number, we call it a finite state space Markov chain, we will see that in molecular biology it is common to have 4 states, the four nucleotides, or 20 states, the amino acids.

A Markov chain is a special case of what mathematicians call a random process in discrete time, we consider day 1, day 2, day 3, …as the times and are looking at a sequence of dependent random variables at these times \(X_0, X_1, X_2, \ldots, X_n\) the random variable can take on any of the possible states with some probability.

Example: \(X_0=1\), take \(X_{t+1}=X_{t}+Y_t\) where \(Y_t\) are independent binomials with distribution B(10,p=0.5).

Thus we can summarize several steps of the Markov chain by using powers of the transition matrix, in general we write:

Then

Questions:
.#) Prove that if knowing both *U* and *V* doesn’t change the
conditional probability then knowing *U* alone doesn’t change the
conditional probability.

- .#) Conclude that \(P(Y=y \vert X=x,U=u,V=v)=P(Y=y\vert X=x)\)
- for any
*x*,*y*,*u*,*v*. Then

Answer: Proof of claim \(A=\{X=x,U=u\}\) Then

This shows that \(P(X_{n+2}=k\mid X_n=i) = ({\bf P}^2)_{i,k}\) where
\({\bf P}^2\) means the matrix product \({\bf P}{\bf P}\) Notice
both that the quantity does not depend on *n* and that we can compute it
by taking a power of \({\bf P}\) . More general version
\(P(X_{n+m}=k\mid X_n=j) = ({\bf P}^m)_{j,k}\) Since
\({\bf P}^n{\bf P}^m={\bf P}^{n+m}\) we get the Chapman-Kolmogorov
equations:

**Summary**

A Markov Chain has *stationary* *n* step transition probabilities
which are the *n* th power of the 1 step transition probabilities.

Here is R output for the 1, 2, 4, 8 and 16 step transition matrices for our weather example:

```
> P%*%P
[,1] [,2]
[1,] 0.44 0.56
[2,] 0.28 0.72
> P2= P%*%P
> P4= P2%*%P2
> P8= P4%*%P4
> P16= P8%*%P8
> P16
[,1] [,2]
[1,] 0.3333336 0.6666664
[2,] 0.3333332 0.6666668
```

This computes the powers of P.

Fact:

Suppose we toss a coin with a biased probability of heads \(P(H)=\alpha_D\) and start the chain with Dry if we get heads and Wet if we get tails. Then

and

Notice the last line is a matrix multiplication of the row vector \(\alpha\) by matrix multiplication of the row vector \(\alpha\) by matrix \({\bf P}\) . A special \(\alpha\) : if we put \(\alpha_D=1/3\) and \(\alpha_W=2/3\) then

\(P(X_0=D)=1/3\) then \(P(X_1=D)=1/3\) and analogously for *W* .
This means that \({\it X}_{0}\) and \({\it X}_{1}\) have the
same distribution.
A probability vector \(\alpha\) is the initial distribution for the
chain if :math:`P(X_0=i) = alpha_i. `

A Markov Chain is **stationary** if \(P(X_1=i) = P(X_0=i)\) for
all *i* .

An initial distribution is called *stationary* if the chain is
stationary. We find that \(\alpha\) is a stationary initial
distribution if \(\alpha {\bf P}=\alpha \, .\)

Suppose \({\bf P}^n\) converges to some matrix \({\bf P}^\infty\) . Notice that \(\lim_{n\to\infty} {\bf P}^{n-1} = {\bf P}^\infty\) and

This proves that each row \(\alpha\) of \({\bf P}^\infty\) satisfies \(\alpha = \alpha {\bf P}\, .\)

Definition: A row vector *x* is a left eigenvector of *A* with
eigenvalue \(\lambda\) if \(xA=\lambda x \, .\)
So each row of \({\bf P}^\infty\) is a left eigenvector of
\({\bf P}\) with eigenvalue 1.