多层对象数组合并_DataFrame合并

最新推荐文章于 2024-04-04 14:35:58 发布

weixin_39997089

最新推荐文章于 2024-04-04 14:35:58 发布

阅读量783

点赞数

文章标签：多层对象数组合并

本文链接：https://blog.csdn.net/weixin_39997089/article/details/112714076

版权

本文详细介绍了Python中DataFrame的三种合并方式：join、merge和Concat。join主要关注行的合并，包括left、right和outer连接；merge侧重列的合并，支持inner、left、right和outer连接，并能指定合并键；Concat则用于按行或列合并DataFrame，同时可以创建层次化索引。

摘要由CSDN通过智能技术生成

DataFrame合并方式有3种：

1、join （着重关注的是行的合并）

left 左连接：df3.join(df4,how='left')
简单合并（默认是left左连接,以左侧df3为基础）
right右连接：df3.join(df4,how='right')
右连接，以右侧的数组为基础
outer外连接：df3.join(df4,how='outer')
左右两侧都包含

所有合并连接，遇到没有的值都默认为NaN

合并多个DataFrame对象：df3.join([df4,df5])
df4,df5两个数组都是以df3数组为基础合并的

2、merge（着重关注的是列的合并）

how=’inner’
默认下是根据左右对象中出现同名的列作为连接的键pd.merge(df1,df2)
指定列名合并pd.merge(df1,df2,on='名字',suffixes=['_1','_2'])
根据名字这一列合并，suffixes 设置列名的后缀名（相同列不同内容时用）
左连接left：pd.merge(df1,df2,how='left')
以左侧为准
右连接right：pd.merge(df1,df2,how='right')
以右侧为准
外连接outer：pd.merge(df1,df2,how='outer')
df1和df2数组全部包含
根据多个键连接（列名):pd.merge(df1,df2,on=['职称','名字'])
根据 ‘职称’,‘名字’ 进行合并

df1、df2、df3、df4、df5如下：

df1=pd.DataFrame({'名字':list('ABCDE'),'性别':['男','女','男','男','女'],
'职称':['副教授','讲师','助教','教授','助教']},index=range(1001,1006))
df2=pd.DataFrame({'名字':list('ABDAX'),
                  '课程':['C++','计算机导论','汇编','数据结构','马克思原理'],
                  '职称':['副教授','讲师','教授','副教授','讲师']},
                 index=[1001,1002,1004,1001,3001])
df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abc'))
df4=pd.DataFrame({'Blue':[1,9,8],'Yellow':[6,6,7]},index=list('cde'))
df5=pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))

3、Concat

objs:合并对象
axis：合并方式，0-列，1-行
ignore_index:是否忽略索引

按行合并：pd.concat([df1,df2],axis=1)
按行合并会以拥有最多行的为基准，最后合并后呈现的行数就是为基准的行数。

df1=pd.DataFrame(np.arange(6).reshape(3,2),columns=['four','five'])
df2=pd.DataFrame(np.arange(6).reshape(2,3),columns=['one','two','three'])
result=pd.concat([df1,df2],axis=1)
print(result)

按列合并：

pd.concat([df1,df2],axis=0,ignore_index=True)

按列合并不会合并行，所以即使行索引一致也不会合并。

result=pd.concat([df1,df2],axis=0,ignore_index=True)
print(result)

用内连接求交集(连接方式，共有’inner’,’left’,right’,’outer’)pd.concat([df1,df2],axis=1,join='inner')
合并相同的索引行数据
指定部分索引进行连接：pd.concat([s1,s2],axis=1,join_axes=[list('abc')])
指定索引 abc 进行合并

s1=pd.Series([1,2],index=list('ab'))
s2=pd.Series([3,4,5],index=list('bde'))
print(s1)
print(s2)
# 指定部分索引进行连接
print(pd.concat([s1,s2],axis=1,join_axes=[list('abc')]))

创建层次化索引：pd.concat([s1,s2],keys=['A','B'])
就是给s1和s2合并之后分别起了名字叫A和B，相当于数组的索引（key）
当纵向连接时keys为列名：pd.concat([s1,s2],keys=['A','D'],axis=1)
用字典的方式连接同样可以创建层次化列索引:pd.concat({'A':s1,'B':s2,axis=1)