python 模糊匹配合并_是否可以与python pandas进行模糊匹配合并？

最新推荐文章于 2024-03-19 17:47:50 发布

weixin_39803022

最新推荐文章于 2024-03-19 17:47:50 发布

阅读量343

点赞数

文章标签： python 模糊匹配合并

本文介绍如何使用Python的difflib库解决DataFrame合并问题，通过get_close_matches函数处理因拼写错误、不同格式导致的名称不一致，实现基于类似度的合并操作，适用于df1与df2中列名的灵活匹配。

摘要由CSDN通过智能技术生成

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I would like to be able to merge as long as they are similar to one another.

Any similarity algorithm will do (soundex, Levenshtein, difflib's).

Say one DataFrame has the following data:

df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])

number

one 1

two 2

three 3

four 4

five 5

df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])

letter

one a

too b

three c

fours d

five e

Then I want to get the resulting DataFrame

number letter

one 1 a

two 2 b

three 3 c

four 4 d

five 5 e

解决方案

Similar to @locojay suggestion, you can apply difflib's get_close_matches to df2's index and then apply a join:

In [23]: import difflib

In [24]: difflib.get_close_matches

Out[24]:

In [25]: df2.index = df2.index.map(lambda x: difflib.get_close_matches(x, df1.index)[0])

In [26]: df2

Out[26]:

letter

one a

two b

three c

four d

five e

In [31]: df1.join(df2)

Out[31]:

number letter

one 1 a

two 2 b

three 3 c

four 4 d

five 5 e

If these were columns, in the same vein you could apply to the column then merge:

df1 = DataFrame([[1,'one'],[2,'two'],[3,'three'],[4,'four'],[5,'five']], columns=['number', 'name'])

df2 = DataFrame([['a','one'],['b','too'],['c','three'],['d','fours'],['e','five']], columns=['letter', 'name'])

df2['name'] = df2['name'].apply(lambda x: difflib.get_close_matches(x, df1['name'])[0])

df1.merge(df2)

weixin_39803022

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

python 模糊匹配 合并_是否可以与python pandas进行模糊匹配合并？

python 模糊匹配合并_是否可以与python pandas进行模糊匹配合并？