Markov Models

compling
Author
Published

March 7, 2024

Modified

April 8, 2024

I started the notes on ngrams models saying that we can think of them like finite-state-automata, but with probabilities on each transition between states. Another word for this is “Markov Model.”

Markov Chains

Any system that has a set of state that you can transition between problematically could be called a Markov Chain. They have applications well beyond ngram models, and even beyond linguistics.

Working out ngram probabilities

Let’s say we had a Markov Chain with “just” three states:

stateDiagram
  direction LR
  [*] --> A: 0.5
  [*] --> B: 0.25
  [*] --> C: 0.25
  
  A --> [*]: 0.1
  A --> A: 0.2
  A --> B: 0.3
  A --> C: 0.4
  
  B --> [*]: 0.9
  B --> C: 0.1
  
  C --> [*]: 0.2
  C --> C: 0.7
  C --> B: 0.1

It’s going to be easier to represent all of the probabilities of transition between one state and the next as a “transition matrix”

from to
start A B C end
start 0 0.5 0.25 0.25 0.0
A 0 0.2 0.30 0.40 0.1
B 0 0.0 0.00 0.10 0.9
C 0 0.0 0.10 0.70 0.2
end 0 0.0 0.00 0.00 1.0

👻 Matrix Multiplication 🤖

  • What is the probability that we’ll wind up in the “end state” in 3 steps?

Or, if we asked this question about sentences generated by an ngram model:

  • What is the probability that a sentence will be 3 words long?

We can get at this with “matrix multiplication”.

python
import numpy as np
trans_mat
array([[0.  , 0.5 , 0.25, 0.25, 0.  ],
       [0.  , 0.2 , 0.3 , 0.4 , 0.1 ],
       [0.  , 0.  , 0.  , 0.1 , 0.9 ],
       [0.  , 0.  , 0.1 , 0.7 , 0.2 ],
       [0.  , 0.  , 0.  , 0.  , 1.  ]])

To get the probability of our state after the first step, we need to get that top row of probabilities. One way to do this is to represent our current state (the start state) like this

python
start_state = np.array([
  1, # start 
  0, # A 
  0, # B
  0, # C
  0, # end
  ])
  
start_state
array([1, 0, 0, 0, 0])

The 1 represents the probability we’re in the starting position, and the 0s represent that we’re in no other position. If we treat this like a “row vector”, we can get the probability of the next state with matrix multiplication.

\[ [ 1, 0, 0, 0,0] \left[ \begin{array}{ccccc} 0. & 0.5 & 0.25& 0.25& 0\\ 0. & 0.2 & 0.3 & 0.4 & 0.1 \\ 0. & 0. & 0. & 0.1 & 0.9\\ 0. & 0. & 0.1 & 0.7 & 0.2 \\ 0. & 0. & 0. & 0. & 0. \end{array} \right] \]

In matrix multipication, we multiply each row from the left hand side by each column on the right hand side, and sum the result together. In numpy, there’s an operator @ for this.

python
first_step = start_state @ trans_mat
first_step
array([0.  , 0.5 , 0.25, 0.25, 0.  ])

Now, our first_state row vector represents the probability we’re in each state. How do we get the probabilities of our location after the second step? Just multiply again!

python
second_step = first_step @ trans_mat
second_step
array([0.   , 0.1  , 0.175, 0.4  , 0.325])

And for the third step:

python
third_step = second_step @ trans_mat
third_step
array([0.    , 0.02  , 0.07  , 0.3375, 0.5725])
python
trans_mat @ trans_mat @ trans_mat
array([[0.    , 0.02  , 0.07  , 0.3375, 0.5725],
       [0.    , 0.008 , 0.051 , 0.299 , 0.642 ],
       [0.    , 0.    , 0.007 , 0.05  , 0.943 ],
       [0.    , 0.    , 0.05  , 0.357 , 0.593 ],
       [0.    , 0.    , 0.    , 0.    , 1.    ]])
python
net_mat = np.array([
  [0.9, 0.1],
  [0.3, 0.7]
])

net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat
array([[0.75004232, 0.24995768],
       [0.74987305, 0.25012695]])

Hidden Markov Models

“Hidden” Markov Models, or HMMs, have had a lot of applications in acoustics, specifically in Automated Speech Recognition and Forced Alignment. The core idea is that you can’t directly observe the state, but you can observe something about the state. One analogy is to use the weather.

Let’s say we are, for whatever reason, unable to directly observe the weather outside. But we are able to observe whether or not the sidewalk is wet. And we do know the following:

Wet Sidewalk Dry Sidewalk
🌧️ 0.9 0.1
🌞 0.2 0.8

And we also know that perhaps the best predictor of weather today is what it was yesterday, so

🌧️ 🌞
🌧️ 0.8 0.2
🌞 0.2 0.8

So we have our general Markov Model:

flowchart TB
  subgraph hidden
    direction LR
    sunny --> rainy
    sunny --> sunny
    rainy --> rainy
    rainy --> sunny
  end
  
  subgraph observable
    sunny --> wet
    sunny --> dry
    rainy --> wet
    rainy --> dry
  end

If we get the sequence

wet wet wet dry wet dry dry dry

We can actually work out the probability of the weather state at each point in time.

Application to ASR

Back to top

Reuse

CC-BY-SA 4.0

Citation

BibTeX citation:
@online{fruehwald2024,
  author = {Fruehwald, Josef},
  title = {Markov {Models}},
  date = {2024-03-07},
  url = {https://lin511-2024.github.io/notes/meetings/09_markov_models.html},
  langid = {en}
}
For attribution, please cite this work as:
Fruehwald, Josef. 2024. “Markov Models.” March 7, 2024. https://lin511-2024.github.io/notes/meetings/09_markov_models.html.