python编辑距离算法库-fuzzywuzzy

1、环境

1.1 、python3

1.2、centos7

2、解释     

  Levenshtein Distance 算法,又叫 Edit Distance 算法,编辑距离是针对二个字符串
(例如英文字)的差异程度的量化量测,量测方式是看至少需要多
少次的处理才能将一个字符串变成另一个字符串。编辑距离可以用在自然语言处理中,例如拼写
检查可以根据一个拼错的字和其他正确的字的编辑距离,判断哪一个(或哪几个)是比较可能的字。
DNA也可以视为用A、C、G和T组成的字符串,因此编辑距离也用在生物信息学中,判断二个DNA的类
似程度。Unix 下的 diff 及 patch 即是利用编辑距离来进行文本编辑对比的例子。

3、 安装

pip3  install fuzzywuzzy

pip3 install python-Levenshtein

4、库函数执行

# -*- coding: utf-8 -*- 
from fuzzywuzzy import fuzz
from fuzzywuzzy import process


if __name__ == "__main__":
    #Simple Ratio-简单匹配
    result = fuzz.ratio("this is a test", "this is a test!")
    print("Simple Ratio=",result)

    #Partial Ratio--不完全匹配
    result = fuzz.partial_ratio("this is a test", "this is a test!")
    print("Partial Ratio=",result)

    #Token Sort Ratio-忽略顺序
    result = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    print("Token Sort Ratio2=",result)

    #Token Set Ratio -忽略重复
    result = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    print("Token Set Ratio=",result)

    #Process
    choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    result = process.extract("new york jets", choices, limit=3) #按照顺序返回最接近的3个
    print("Process-extract",result)

    result = process.extractOne("new york", choices)#返回最接近的一个
    print("Process-extractOne=",result)

    songs = [
        "/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3",
        "/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3",
    ]
    result = process.extractOne("System of a down - Hypnotize - Heroin", songs) #支持通配符
    print("Process-extractOne=",result)
    result = process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio) #忽略顺序
    print("Process-extractOne=",result)

5、运行结果

[sy@iZwz9d0wcbzzl41m47ou4yZ words]$ python3 fuzzywuzzy_sim.py 
Simple Ratio= 97
Partial Ratio= 100
Token Sort Ratio2= 100
Token Set Ratio= 100
Process-extract [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29)]
Process-extractOne= ('New York Jets', 90)
Process-extractOne= ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
Process-extractOne= ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值