1. 汉明距离 Hamming Distance
汉明距离表示两个等长的字符串X,Y,将X的对应位置变成Y的最小替换次数
Python实现:
import numpy as np
import pandas as pd
def hamming_distance(x, y):
return np.mean(x != y)
def hamming_distance_2(x, y):
from scipy.spatial.distance import pdist
return pdist(np.vstack([x, y]), 'hamming')[0]
if __name__ == '__main__':
df = pd.DataFrame(np.random.randint(0, 50, size=(100, 2)), columns=['x', 'y'])
print(hamming_distance(df.x, df.y))
print(hamming_distance_2(df.x, df.y))
2. 编辑距离 Edit Distance
又称Levenshtein距离,编辑距离表示只使用插入、删除、替换三个操作,将一个字符串转变为另一个字符串的最少修改次数,使用方法如下:
首先安装依赖:pip install python-Levenshtein
import Levenshtein
if __name__ == '__main__':
print(Levenshtein.distance( # 编辑距离
"hello world",
"HELLO world",
))
print(Levenshtein.hamming( # 汉明距离
"hello world",
"HELLO world",
))