分词模型，维特比算法

最新推荐文章于 2022-03-05 11:54:17 发布

张一爻

最新推荐文章于 2022-03-05 11:54:17 发布

阅读量227

点赞数 1

分类专栏： python代码整合

本文链接：https://blog.csdn.net/weixin_43069769/article/details/107432506

版权

python代码整合专栏收录该内容

115 篇文章 17 订阅

订阅专栏

维特比算法是一种动态规划算法，也是递归算法
本质上来说，就是寻找最短步骤的一种算法
NLP里常常用来分词并且保留语意对比
这里做了修改，尽可能用循环替代了递归部分

word_vector = list(word_dict)

def search_prob(word,word_dict=word_dict):
    try:
        return word_dict[word]
    except Exception:
        return 10**(-8)

def check_dict(word):
    return word in word_vector

def slide_word(strings,start,end):
    result_index,result_check = [],[]
    for i in range(start,end+1):
        word = strings[start:i]
        check = check_dict(word)
        if check:
            result_check.append(check)
            result_index = (start,i)
    return len(result_index) > 0 and result_index or False
slide_word(example,3,7)

def Viterbi(string,word_dict,opcode=[10**-8],word_split=[],prob = 0): 
    t ,m = 0,len(string)
    L,R , result ,testlog = [],[],[],[]
    disposal_data = string
    for i in range(m+1):
        test = slide_word(string,i,m)
        if test:
            t+=1
            l , r  = test
            L.append(l)
            R.append(r)
            if t == 1:
                word = string[l:r]
                prob += search_prob(word)
                disposal_data = disposal_data.replace(word,"/")
                result.append(word)
            elif l in R:
                word = string[l:r]
                prob += search_prob(word)
                disposal_data = disposal_data.replace(word,"/")
                result.append(word)

    check_symbols = len(set(disposal_data))
    opcode.append(check_symbols)
    word_split = word_split + result
    if opcode[-1]-opcode[-2]==0:
        expr = prob+10**(-8)*len(disposal_data.replace('/',''))
        return expr,word_split+list(filter(lambda x : x !='' ,disposal_data.split('/')))
    return Viterbi(disposal_data,word_dict,opcode,word_split,prob)
Viterbi(example,word_dict)

张一爻

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
分词模型，维特比算法

维特比算法是一种动态规划算法，也是递归算法本质上来说，就是寻找最短步骤的一种算法NLP里常常用来分词并且保留语意对比def search_prob(word,word_dict=word_dict): try: return word_dict[word] except Exception: return 10**(-8)def check_dict(word): return word in word_vectordef slide_w
复制链接

扫一扫

专栏目录