计算数值和字符串的相似度。包括编辑距离计算相似度和difflib.SequenceMatcher计算相似度
看下代码就清楚了,写的很清晰。
更多学习可以参考文章:https://www.cnblogs.com/chenpeng9/articles/4605577.html
#encoding:utf-8
__author__ = 'zgd'
from collections import Counter
import time
# count the value sim
def numSim(a,b):
a = float(a)
if(a*b==0) and (abs(a-b)<3):
return 0.8
else:
try:
return 1.0 - abs(a-b)/max(a,b)
except:
return 1.0
# count the longest common subsequence
def longSim(L1,L2):
m =len(L1)
n = len(L2)
if m == 0 and n == 0:
return 1.0
elif (m*n==0):
if m<10 or n<10:
return 0.3
else:return 0.0
else:
c1 = Counter(L1)
c2 = Counter(L2)
d1 = dict(c1)
d2