1、环境
1.1 、python3
1.2、centos7
2、解释
Levenshtein Distance 算法,又叫 Edit Distance 算法,编辑距离是针对二个字符串
(例如英文字)的差异程度的量化量测,量测方式是看至少需要多
少次的处理才能将一个字符串变成另一个字符串。编辑距离可以用在自然语言处理中,例如拼写
检查可以根据一个拼错的字和其他正确的字的编辑距离,判断哪一个(或哪几个)是比较可能的字。
DNA也可以视为用A、C、G和T组成的字符串,因此编辑距离也用在生物信息学中,判断二个DNA的类
似程度。Unix 下的 diff 及 patch 即是利用编辑距离来进行文本编辑对比的例子。
3、 安装
pip3 install fuzzywuzzy
pip3 install python-Levenshtein
4、库函数执行
# -*- coding: utf-8 -*-
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
if __name__ == "__main__":
#Simple Ratio-简单匹配
result = fuzz.ratio("this is a test", "this is a test!")
print("Simple Ratio=",result)
#Partial Ratio--不完全匹配
result = fuzz.partial_ratio("this is a test", "this is a test!")
print("Partial Ratio=",result)
#Token Sort Ratio-忽略顺序
result = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print("Token Sort Ratio2=",result)
#Token Set Ratio -忽略重复
result = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
print("Token Set Ratio=",result)
#Process
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
result = process.extract("new york jets", choices, limit=3) #按照顺序返回最接近的3个
print("Process-extract",result)
result = process.extractOne("new york", choices)#返回最接近的一个
print("Process-extractOne=",result)
songs = [
"/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3",
"/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3",
]
result = process.extractOne("System of a down - Hypnotize - Heroin", songs) #支持通配符
print("Process-extractOne=",result)
result = process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio) #忽略顺序
print("Process-extractOne=",result)
5、运行结果
[sy@iZwz9d0wcbzzl41m47ou4yZ words]$ python3 fuzzywuzzy_sim.py
Simple Ratio= 97
Partial Ratio= 100
Token Sort Ratio2= 100
Token Set Ratio= 100
Process-extract [('New York Jets', 100), ('New York Giants', 79), ('Atlanta Falcons', 29)]
Process-extractOne= ('New York Jets', 90)
Process-extractOne= ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
Process-extractOne= ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)