Markov Models
I started the notes on ngrams models saying that we can think of them like finite-state-automata, but with probabilities on each transition between states. Another word for this is “Markov Model.”
Markov Chains
Any system that has a set of state that you can transition between problematically could be called a Markov Chain. They have applications well beyond ngram models, and even beyond linguistics.
Working out ngram probabilities
Let’s say we had a Markov Chain with “just” three states:
It’s going to be easier to represent all of the probabilities of transition between one state and the next as a “transition matrix”
from | to | ||||
---|---|---|---|---|---|
start | A | B | C | end | |
start | 0 | 0.5 | 0.25 | 0.25 | 0.0 |
A | 0 | 0.2 | 0.30 | 0.40 | 0.1 |
B | 0 | 0.0 | 0.00 | 0.10 | 0.9 |
C | 0 | 0.0 | 0.10 | 0.70 | 0.2 |
end | 0 | 0.0 | 0.00 | 0.00 | 1.0 |
👻 Matrix Multiplication 🤖
- What is the probability that we’ll wind up in the “end state” in 3 steps?
Or, if we asked this question about sentences generated by an ngram model:
- What is the probability that a sentence will be 3 words long?
We can get at this with “matrix multiplication”.
python
import numpy as np
trans_mat
array([[0. , 0.5 , 0.25, 0.25, 0. ],
[0. , 0.2 , 0.3 , 0.4 , 0.1 ],
[0. , 0. , 0. , 0.1 , 0.9 ],
[0. , 0. , 0.1 , 0.7 , 0.2 ],
[0. , 0. , 0. , 0. , 1. ]])
To get the probability of our state after the first step, we need to get that top row of probabilities. One way to do this is to represent our current state (the start state) like this
python
= np.array([
start_state 1, # start
0, # A
0, # B
0, # C
0, # end
])
start_state
array([1, 0, 0, 0, 0])
The 1 represents the probability we’re in the starting position, and the 0s represent that we’re in no other position. If we treat this like a “row vector”, we can get the probability of the next state with matrix multiplication.
\[ [ 1, 0, 0, 0,0] \left[ \begin{array}{ccccc} 0. & 0.5 & 0.25& 0.25& 0\\ 0. & 0.2 & 0.3 & 0.4 & 0.1 \\ 0. & 0. & 0. & 0.1 & 0.9\\ 0. & 0. & 0.1 & 0.7 & 0.2 \\ 0. & 0. & 0. & 0. & 0. \end{array} \right] \]
In matrix multipication, we multiply each row from the left hand side by each column on the right hand side, and sum the result together. In numpy, there’s an operator @
for this.
python
= start_state @ trans_mat
first_step first_step
array([0. , 0.5 , 0.25, 0.25, 0. ])
Now, our first_state
row vector represents the probability we’re in each state. How do we get the probabilities of our location after the second step? Just multiply again!
python
= first_step @ trans_mat
second_step second_step
array([0. , 0.1 , 0.175, 0.4 , 0.325])
And for the third step:
python
= second_step @ trans_mat
third_step third_step
array([0. , 0.02 , 0.07 , 0.3375, 0.5725])
python
@ trans_mat @ trans_mat trans_mat
array([[0. , 0.02 , 0.07 , 0.3375, 0.5725],
[0. , 0.008 , 0.051 , 0.299 , 0.642 ],
[0. , 0. , 0.007 , 0.05 , 0.943 ],
[0. , 0. , 0.05 , 0.357 , 0.593 ],
[0. , 0. , 0. , 0. , 1. ]])
python
= np.array([
net_mat 0.9, 0.1],
[0.3, 0.7]
[
])
@ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat @ net_mat net_mat
array([[0.75004232, 0.24995768],
[0.74987305, 0.25012695]])
Reuse
Citation
@online{fruehwald2024,
author = {Fruehwald, Josef},
title = {Markov {Models}},
date = {2024-03-07},
url = {https://lin511-2024.github.io/notes/meetings/09_markov_models.html},
langid = {en}
}