python 模糊匹配字符串 excel_如何使用Pandas在excel文件上进行模糊匹配？

最新推荐文章于 2024-08-18 03:31:57 发布

weixin_39586265

最新推荐文章于 2024-08-18 03:31:57 发布

阅读量758

点赞数

FuzzyWuzzy 数据清洗匹配算法 Python DataFrame

关键词由CSDN通过智能技术生成

我认为你不需要在熊猫身上这样做。这是我草率的解决方案，但它通过字典获得您想要的输出。在from fuzzywuzzy import process

df = pd.DataFrame([

['0016F00001c7GDZQA2', 'Daniela Abriani'],

['0016F00001c7GPnQAM', 'Daniel Abriani'],

['0016F00001c7JRrQAM', 'Nisha Well'],

['0016F00001c7Jv8QAE', 'Katherine'],

['0016F00001c7cXiQAI', 'Katerine'],

['0016F00001c7dA3QAI', 'Katherin'],

['0016F00001c7kHyQAI', 'Nursing and Midwifery Council Research Office'],

['0016F00001c8G8OQAU', 'Nisa Well']],

columns=['ID', 'NAME'])

在字典中获取唯一的哈希值。在

^{pr2}$

定义函数checkpair。你需要它来删除相互的哈希对。此方法将添加(hash1, hash2)和(hash2, hash1)，但我认为您只希望保留其中一对：def checkpair (a,b,l):

for x in l:

if (a,b) == (x[2],x[0]):

l.remove(x)

现在迭代hashdict.items()查找前3个匹配项。fuzzyfuzzy docs详细介绍了process方法。在matches = []

for k,v in hashdict.items():

#see docs for extract 4 because you are comparing a name to itself

top3 = process.extract(v, hashdict, limit=4)

#remove the hashID compared to itself

for h in top3:

if k == h[2]:

top3.remove(h)

#append tuples to the list "matches" if it meets a score criteria

[matches.append((k, v, x[2], x[0], x[1])) for x in top3 if x[1] > 60] #change score?

#remove reciprocal pairs

[checkpair(m[0], m[2], matches) for m in matches]

df = pd.DataFrame(matches, columns=['id1', 'name1', 'id2', 'name2', 'score'])

# write to file

writer = pd.ExcelWriter('/path/to/your/file.xlsx')

df.to_excel(writer,'Sheet1')

writer.save()

输出：id1 name1 id2 name2 score

0 0016F00001c7JRrQAM Nisha Well 0016F00001c8G8OQAU Nisa Well 95

1 0016F00001c7GPnQAM Daniel Abriani 0016F00001c7GDZQA2 Daniela Abriani 97

2 0016F00001c7Jv8QAE Katherine 0016F00001c7dA3QAI Katherin 94

3 0016F00001c7Jv8QAE Katherine 0016F00001c7cXiQAI Katerine 94

4 0016F00001c7dA3QAI Katherin 0016F00001c7cXiQAI Katerine 88

weixin_39586265

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。