pandas merge 列内数据相同/合并键不唯一 的情况 (坑)

输入Dataframe 需要merge的列元素值相同

df6 = pd.read_csv('haha1.csv', encoding='gbk')
df7 = pd.read_csv('haha2.csv', encoding='gbk')
print(df6.head())
print(df7.head())

df8 = pd.merge(df6, df7, on='业务流名称', how='left')
print(df8.head())


结果如下:

                       业务流名称   情况_STEP0
0  helf-esn-g-1->bjbj-yz-g-2   Critical
1  helf-esn-g-1->bjbj-yz-g-2  Criticals
                       业务流名称   情况_STEP1
0  helf-esn-g-1->bjbj-yz-g-2   Critical
1  helf-esn-g-1->bjbj-yz-g-2  Criticala
                       业务流名称   情况_STEP0   情况_STEP1
0  helf-esn-g-1->bjbj-yz-g-2   Critical   Critical
1  helf-esn-g-1->bjbj-yz-g-2   Critical  Criticala
2  helf-esn-g-1->bjbj-yz-g-2  Criticals   Critical
3  helf-esn-g-1->bjbj-yz-g-2  Criticals  Criticala

使用concat函数,join_axes 参数在新版本已经关闭; 重新设置对应列为 index,再用concat,达到目的:

df6 = pd.read_csv('haha1.csv', encoding='gbk', index_col=None)
df7 = pd.read_csv('haha2.csv', encoding='gbk', index_col=None)

df6.set_index('业务流名称', inplace=True)
df7.set_index('业务流名称', inplace=True)
print(df6.head())
print(df7.head())

df8 = pd.concat([df6, df7], axis=1, join='outer')
print(df8.head())

结果如下

                            情况_STEP0
业务流名称                               
helf-esn-g-1->bjbj-yz-g-2   Critical
helf-esn-g-1->bjbj-yz-g-2  Criticals
                            情况_STEP1
业务流名称                               
helf-esn-g-1->bjbj-yz-g-2   Critical
helf-esn-g-1->bjbj-yz-g-2  Criticala
                            情况_STEP0   情况_STEP1
业务流名称                                          
helf-esn-g-1->bjbj-yz-g-2   Critical   Critical
helf-esn-g-1->bjbj-yz-g-2  Criticals  Criticala

可以查看另外以为网友遭遇:https://blog.csdn.net/weixin_43064185/article/details/90665301


concat函数中 verify_integrity的坑,手册描述的坑:

df6 = pd.read_csv('haha1.csv', encoding='gbk', index_col=None)
df7 = pd.read_csv('haha2.csv', encoding='gbk', index_col=None)

df6.set_index('业务流名称', inplace=True)
df7.set_index('业务流名称', inplace=True)
print(df6.head())
print(df7.head())

df8 = pd.concat([df6, df7], axis=1, join='outer', verify_integrity=False)
print(df8.head())


                           情况_STEP0
业务流名称                              
helf-esn-g-1->bjbj-yz-g-2  Critical
                           情况_STEP0
业务流名称                              
helf-esn-g-1->bjbj-yz-g-2  Critical
                           情况_STEP0  情况_STEP0
业务流名称                                        
helf-esn-g-1->bjbj-yz-g-2  Critical  Critical

True的时候

df6 = pd.read_csv('haha1.csv', encoding='gbk', index_col=None)
df7 = pd.read_csv('haha2.csv', encoding='gbk', index_col=None)

df6.set_index('业务流名称', inplace=True)
df7.set_index('业务流名称', inplace=True)
print(df6.head())
print(df7.head())

df8 = pd.concat([df6, df7], axis=1, join='outer', verify_integrity=True)
print(df8.head())

                           情况_STEP0
业务流名称                              
helf-esn-g-1->bjbj-yz-g-2  Critical
Traceback (most recent call last):
  File "D:/02 Python/tool_2/test40.py", line 43, in <module>
    df8 = pd.concat([df6, df7], axis=1, join='outer', verify_integrity=True)
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 271, in concat
    op = _Concatenator(
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 452, in __init__
    self.new_axes = self._get_new_axes()
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 515, in _get_new_axes
    return [
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 516, in <listcomp>
    self._get_concat_axis() if i == self.axis else self._get_comb_axis(i)
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 572, in _get_concat_axis
    self._maybe_check_integrity(concat_axis)
  File "C:\Users\z00527459\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\reshape\concat.py", line 580, in _maybe_check_integrity
    raise ValueError(
ValueError: Indexes have overlapping values: Index(['情况_STEP0'], dtype='object')
                           情况_STEP0
业务流名称                              
helf-esn-g-1->bjbj-yz-g-2  Critical


  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值