Week7-3HMM1

最新推荐文章于 2024-06-23 10:17:04 发布

zypandora

最新推荐文章于 2024-06-23 10:17:04 发布

阅读量333

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50444841

版权

NLP(Michigan) 专栏收录该内容

45 篇文章 0 订阅

订阅专栏

Markov model

sequence of random variables that are not independent
- weather report
- text

Properties

limited horizon
- $P(X_{t+1} = s_k \mid X_1, ..., X_t) = P(X_{t+1} = s\mid X_t)$ (first order)
time invariant(stationary)
- $P(X_2 = s_k) = P(X_1)$

Visible MM

P (X 1, . . ., X T) = P (X 1) P (X 2 ∣ X 1) P (X 3 ∣ X 1, X 2) . . . P (X T ∣ X 1, X 2, . . ., X T - 1) = P (X 1) P (X 2 ∣ X 1) P (X 3 ∣ X 2) . . . P (X t ∣ X t - 1)

$\begin{align} P(X_1, ..., X_T) &= P(X_1)P(X_2 \mid X_1)P(X_3 \mid X_1, X_2) ... P(X_T \mid X_1, X_2, ..., X_{T-1})\\ &= P(X_1)P(X_2 \mid X_1) P(X_3 \mid X_2) ... P(X_t \mid X_{t-1}) \end{align}$

Hidden MM

Motivation
- observing a sequence of symbols
- the sequence of state that led to the generation of the symbols is hidden
Definition
- Q = sequence of states
- O = sequence of observations, drawn from a vocabulary
- $q_0, q_f$ = special (start, final) states
- A = state transition probabilities
- B = symbol emission probabilities
- $\Pi$ = initial state probabilities
- $\mu(A,B,Pi)$ = complete probabilistic model

used to model state sequences and observation sequences

Generative algorithm

pick start state from $Pi$
For t = 1…T
- move to another state based on A
- emit an observation based on B

Example

State probability

这里写图片描述

Emission probability

这里写图片描述

Initial

$P (A ∣ s t a r t) = 1.0 P (B ∣ s t a r t) = 0.0$ $P(A\mid start) = 1.0 \\ P(B\mid start) = 0.0$
Transition

$P (A ∣ A) = 0.8 P (A ∣ B) = 0.6 P (B ∣ A) = 0.2 P (B ∣ B) = 0.4$ $P(A \mid A) = 0.8\\ P(A \mid B) = 0.6\\ P(B \mid A) = 0.2\\ P(B \mid B) = 0.4$
Emission: see previous table

observation of the sequence “yz”

Possible sequences of states:
- AA
- AB
- BA
- BB

p (y z) = p (y z ∣ A A) + p (y z ∣ A B) + p (y z ∣ B A) + p (y z ∣ B B) = 1.0 \times 0.2 \times 0.8 \times 0.1 = 1.0 \times 0.2 \times 0.2 \times 0.2 = 1.0 \times 0.5 \times 0.6 \times 0.2 = 1.0 \times 0.5 \times 0.4 \times 0.1

$\begin{align} p(yz) &= p(yz \mid AA) + p(yz \mid AB) + p(yz \mid BA) + p(yz \mid BB)\\ &=1.0 \times 0.2 \times 0.8 \times 0.1 \\ &=1.0 \times 0.2 \times 0.2 \times 0.2 \\ &=1.0 \times 0.5 \times 0.6 \times 0.2 \\ &=1.0 \times 0.5 \times 0.4 \times 0.1 \end{align}$

In this way we could compute all the possibilities of any sequence.

State and transitions

states:
- the states encode the most recent history
- the transitions encode likely sequences of states
- use MLE to estimate the transition probabilities
transitions:
- estimating the emission probabilities
- possible to use standard smoothing and heuristic methods

Sequence of observation

observers can only see the emitted symbols
observation likelihood
- given the observation sequence S and the model $\mu$ , what is the probability $P(S \mid \mu)$ that the sequence was generated by that model
HMM turn to language model

Tasks with HMM

tasks
- Given a $\mu(A, B, \Pi)$ , find the distribution $P(O\mid \mu)$
- Given $O, \mu$ , what is $(X_1. X_2, ..., X_t)$
- Given O and a space of all possible $\mu$ , find the best $\mu$
decoding
- tag each token with a label

Inference

find the most likely tag, given the word
- $t^\ast = \arg\max_t p(t \mid w)$
given the model $\mu$ , we could find the best sequence of tags ${\{ t_i \}_1^n}^\ast$ given the sequence of words $\{w_i\}_1^n$
too many ways for combinations

Viterbi algorithm

Find the best path up to observation i and state s(partial best path), and if we condition on the whole string of i, we will get the best path for the whole sentence.
- dynamic programming
- memoization
- backpointers
initial state
we could calculate the first state $P(t = 1)$

这里写图片描述

say if we want to calculate $P(B , t = 2)$ and calculating $P(A, t = 2)$ is similar
and we could find the best path and best sequence of states
finally we could find the best sequence for all observations