Vitberi Algorithem

Vitberi Algorithem

the code is in link below

https://github.com/US579/DataMining-Project

How to run

Dataset in this case :

State_File ='./toy_example/State_File'
Symbol_File='./toy_example/Symbol_File'
Query_File ='./toy_example/Query_File'

simple run:

python3 submission.py

HMM and Viterbi Algorithem decription

In this case below, the output of the implicit sequence is

[3, 2, 1, 2, 4, -9.397116279119517]

with log probility -9.397116279119517 which is largest probility(this is log probility)
在这里插入图片描述

Parameter breakdown
0:S1       1:S2      2:S3     3:BEGIN      4:END

1.Initial Probilities

The blue line represent the initial probability (Pi) which can be deemed as equivalent to transition probabilities from the BEGIN state to all the implicit state

So, we caculate as

for s in states:
    transition_probability[len(states)-2][states.index(s)])

2.Emission Probilities

The red line represent its emission probability from state after smoothing is

If the symbol is an unknown symbol, its emission probability from state after smoothing is

for i in range(1,len(obs)):
    for cur in range(len(states)):
                #if there is no emission from states `cur` to observation `sym.index(obs[i])`(this is the index in the symbol list),we us add one smoothing (in case it is 0)
                    emission_rate = emission_probability[str(cur)][str(sym.index(obs[i]))]

2.Transition Probilities

The black line represent the transition probilities transfer the states from on to another

for i in range(n1):
    for j in range(n1):
        transition_probability[i][j] = (float(distance[i][j])+1) / (sum(distance[i])+n1-1)

the number of state_i transfer to state_j divide by the total number of transfering state_j to any states , i also use add-1 smoothing here

3.Viterbi Algorithm

for HMM, the most useful function is to find the most likely implicit sequence according to its observation,In general, the HMM problem can be described by the following five elements:

observations :we observed phenomenon sequence
states :all the possible implicit states
start_probability :the initial probilities of each implicit states
transition_probability :the probility of transfering from one implicit states to another
emission_probability :the probility of some implicit states emit some observed phenomenon 

If you use the brute-force method to exhaust all possible state sequences and compare their probability values, the time complexity is O(n^m), obviously , this is unacceptable when we want to find a long sequnce with large dataset, however, we can decrease its time complexity by using Viterbi Algorithem,

we can consider this probelm as dynamic programming , the last_state is the probability of each implicit state corresponding to the previous observed phenomenon, and curr_pro is the probability of each implicit state corresponding to the current observed phenomenon. Solving cur_pro actually depends only on last_state, this is core thinking of Vitberi Algorithem.

MAIN ALGORITHEM
def Viterbi_Algorithm(states,obs,emission_probability,transition_probability):
        '''
        caculate maximum probility of the path , and return as a dict,
        :argument
            :type dict
            states : the states 
            emission_probability : the two-dimension array cotains the emission probility(I use dict here) 
            transition_probability : the two-dimension array cotains the transition probility
            obs : observation sequence
        :return
            A dic
        '''
        for s in states:
            # caculate the initial state probility also count the first emission probility from observation
            curr_pro[s] = math.log(transition_probability[len(states)-2][states.index(s)])+\
                          math.log(emission_probability[states.index(s))],[sym.index(obs[0]))])
        #caculate the rest obervation sequence
        for i in range(1,len(obs)):
            last_pro = curr_pro
            curr_pro = {}
            for cur in range(len(states)):
                    (max_pr,last_state) = max([(last_pro[k]+math.log(transition_probability[states.index(k)][cur])+
                                                math.log(emission_rate), k) for k in states])
                curr_pro[states[cur]] = max_pr
                path[states[cur]].append(last_state)
        return path
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值