python 模糊匹配字符串 excel_如何使用Pandas在excel文件上进行模糊匹配?

我认为你不需要在熊猫身上这样做。这是我草率的解决方案,但它通过字典获得您想要的输出。在from fuzzywuzzy import process

df = pd.DataFrame([

['0016F00001c7GDZQA2', 'Daniela Abriani'],

['0016F00001c7GPnQAM', 'Daniel Abriani'],

['0016F00001c7JRrQAM', 'Nisha Well'],

['0016F00001c7Jv8QAE', 'Katherine'],

['0016F00001c7cXiQAI', 'Katerine'],

['0016F00001c7dA3QAI', 'Katherin'],

['0016F00001c7kHyQAI', 'Nursing and Midwifery Council Research Office'],

['0016F00001c8G8OQAU', 'Nisa Well']],

columns=['ID', 'NAME'])

在字典中获取唯一的哈希值。在

^{pr2}$

定义函数checkpair。你需要它来删除相互的哈希对。此方法将添加(hash1, hash2)和(hash2, hash1),但我认为您只希望保留其中一对:def checkpair (a,b,l):

for x in l:

if (a,b) == (x[2],x[0]):

l.remove(x)

现在迭代hashdict.items()查找前3个匹配项。fuzzyfuzzy docs详细介绍了process方法。在matches = []

for k,v in hashdict.items():

#see docs for extract 4 because you are comparing a name to itself

top3 = process.extract(v, hashdict, limit=4)

#remove the hashID compared to itself

for h in top3:

if k == h[2]:

top3.remove(h)

#append tuples to the list "matches" if it meets a score criteria

[matches.append((k, v, x[2], x[0], x[1])) for x in top3 if x[1] > 60] #change score?

#remove reciprocal pairs

[checkpair(m[0], m[2], matches) for m in matches]

df = pd.DataFrame(matches, columns=['id1', 'name1', 'id2', 'name2', 'score'])

# write to file

writer = pd.ExcelWriter('/path/to/your/file.xlsx')

df.to_excel(writer,'Sheet1')

writer.save()

输出:id1 name1 id2 name2 score

0 0016F00001c7JRrQAM Nisha Well 0016F00001c8G8OQAU Nisa Well 95

1 0016F00001c7GPnQAM Daniel Abriani 0016F00001c7GDZQA2 Daniela Abriani 97

2 0016F00001c7Jv8QAE Katherine 0016F00001c7dA3QAI Katherin 94

3 0016F00001c7Jv8QAE Katherine 0016F00001c7cXiQAI Katerine 94

4 0016F00001c7dA3QAI Katherin 0016F00001c7cXiQAI Katerine 88

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值