- 基于pytorch实现了维特比算法,用于求解隐马尔科夫链的预测问题,由于重点在于理解算法并能用code表达出来,所以并未对于状态转移矩阵、发射概率矩阵和初始状态概率进行估计。实际上A,B,Pi的估计比起viterbi算法本身难度要低很多,直接基于训练数据,用类似矩估计的思想去统计频率直接得到A,B,Pi即可。
- 代码中添加了很详细的注释,新人第一次发博客,希望能够帮助像我一样的小白理解viterbi算法,毕竟无论是HMM还是CRF,viterbi都是其中的灵魂,而弄清楚viterbi算法也能够让自己的动态规划思想更上一层楼(DP大神除外),以下是代码部分
import torch
def viterbi(self, word_list, word2id, state2id, A, B, Pi):
A, B, Pi = torch.log(A), torch.log(B), torch.log(Pi)
N, seq_len = len(state2id), len(word_list)
viterbi = torch.zeros(N, seq_len)
backpointer = torch.zeros(N, seq_len)
B_t = B.t()
start_word_id = word2id.get(word_list[0], None)
if start_word_id is None:
b_t = torch.log(torch.ones(N) / N)
else:
b_t = B_t[start_word_id]
viterbi[:, 0] = Pi + b_t
backpointer[:, 0] = -1
for step in range(1, seq_len):
word_id = word2id.get(word_list[step], None)
if word_id is None:
b_t = torch.log(torch.ones(N) / N)
else:
b_t = B_t[word_id]
for state_id in range(len(state2id)):
max_prob, best_state_id = torch.max(viterbi[:, step - 1] + A[:, state_id], dim=0)
viterbi[state_id, step] = max_prob + b_t[state_id]
backpointer[state_id, step] = best_state_id
max_prob, best_state_id = torch.max(viterbi[:, seq_len - 1], dim=0)
best_path = [best_state_id.item()]
for step in range(seq_len - 1, 0, -1):
best_state_id = backpointer[best_state_id, step]
best_path.append(best_state_id.item())
assert len(best_path) == len(word_list)
id2state = dict((id, state) for state, id in state2id.items())
best_state_path = [id2state[id] for id in reversed(best_path)]
return best_state_path, max_prob