Python判断两个单词的相似度

最新推荐文章于 2024-06-30 15:07:14 发布

dongfuguo

最新推荐文章于 2024-06-30 15:07:14 发布

阅读量2.3k

点赞数

文章标签：算法 python java 正则表达式机器学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/dongfuguo/article/details/118704376

版权

本文要点在于算法的设计：如果两个单词中不相同的字母足够少，并且随机选择几个字母在两个单词中具有相同的前后顺序，则认为两个单词是等价的。

目前存在的问题：可能会有误判。

from random import sample, randint

def oneInAnother(one, another):

'''用来测试单词one中有多少字母不属于单词another'''

return sum((1 for ch in one if ch not in another))

def testPositions(one, another, positions):

'''用来测试单词one中位置positions上的字母是否

与单词another中的相同字母具有同样的前后顺序'''

#获取单词one中指定位置上的字母

lettersInOne = [one[p] for p in positions]

print(lettersInOne)

#这些字母在单词another中的位置

positionsInAnother = [another[p:].index(ch)+p for p, ch in zip(positions,lettersInOne) if ch in another[p:]]

print(positionsInAnother)

#如果这些字母在单词another中也具有相同的前后位置关系，返回True

if sorted(positionsInAnother)==positionsInAnother:

return True

return False

def main(one, another, rateNumber=1.0):

c1 = oneInAnother(one, another)

c2 = oneInAnother(another, one)

#计算比例，测试两个单词有多少字母不相同

r = abs(c1-c2) / len(one+another)

#测试单词one随机位置上的字母是否在another中具有相同的前后顺序

minLength = min(len(one), len(another))

positions = sample(range(minLength), randint(minLength//2, minLength-1))

positions.sort()

flag = testPositions(one, another, positions)

#两个单词具有较高相似度

if flag and r<rateNumber:

return True

return False

#测试效果

print(main('beautiful', 'beaut', 0.2))

print(main('beautiful', 'beautiful', 0.2))

print(main('beautiful', 'btuaeiflu', 0.2))

某次运行结果如下：

['a', 'u']

[2, 3]

False

['a', 'u', 'f', 'u']

[2, 3, 6, 7]

True

['b', 'e', 'a', 'u', 't', 'f']

[0, 4, 3, 8, 6]

False

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python判断两个单词的相似度

本文要点在于算法的设计：如果两个单词中不相同的字母足够少，并且随机选择几个字母在两个单词中具有相同的前后顺序，则认为两个单词是等价的。目前存在的问题：可能会有误判。from random ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

dongfuguo 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。