pandas: pd.concat([df1,df3],axis默认=0纵向拼接)，concat常用于纵向拼接，默认outer join

最新推荐文章于 2024-04-04 14:15:37 发布

ycyrym

最新推荐文章于 2024-04-04 14:15:37 发布

阅读量6k

点赞数 5

原文链接：http://liao.cpython.org/pandas26

版权

http://liao.cpython.org/pandas26/
http://liao.cpython.org/pandas25/
https://blog.csdn.net/weixin_37226516/article/details/64134643

两个Series的拼接，默认是在列上(往下)拼接，axis = 0,如果要横向往右拼接，axis = 1

concat(objs, axis=0, join=‘outer’, join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

s1 = pd.Series(np.arange(10,13))
s2 = pd.Series(np.arange(100,103))

pd.concat([s1,s2])
Out[13]: 
0     10
1     11
2     12
0    100
1    101
2    102
dtype: int32

pd.concat([s1,s2], keys = [1,2])
Out[14]: 
1  0     10
   1     11
   2     12
2  0    100
   1    101
   2    102
dtype: int32

pd.concat([s1,s2], keys = [1,2],names = ['from','ID'])
Out[16]: 
from  ID
1     0      10
      1      11
      2      12
2     0     100
      1     101
      2     102
dtype: int32

横向拼接 axis = 1
要在相接的时候在加上一个层次的key来识别数据源自于哪张表，可以增加key参数

s1 = pd.Series(np.arange(10,15))
s2 = pd.Series(np.arange(100,103))
pd.concat([s1,s2], axis = 1,keys = ['s1','s2'],names = ['from','ID'])
Out[21]: 
   s1     s2
0  10  100.0
1  11  101.0
2  12  102.0
3  13    NaN
4  14    NaN

把有相同columns的两个df拼接：Combine two `DataFrame` objects with identical columns.

练习创建df

idx = 'this is a fake data'.split()
df1 = pd.DataFrame({'Country':['China','Japan','Germany','USA','UK'],'Team':['A','B','A','C','D']},index = idx)

col = 'Country Team'.split()
idx_2 = ['fake','world']
values = [['KLR',100],['abc',200]]
df2 = pd.DataFrame(values,index = idx_2, columns = col)

df1
Out[43]: 
      Country Team
this    China    A
is      Japan    B
a     Germany    A
fake      USA    C
data       UK    D

df2
Out[44]: 
      Country  Team
fake      KLR   100
world     abc   200

默认纵向拼接：

pd.concat([df1,df2])
Out[45]: 
       Country Team
this     China    A
is       Japan    B
a      Germany    A
fake       USA    C
data        UK    D
fake       KLR  100
world      abc  200

添加axis = 1 后的拼接，横向拼接如果index 有相同的，会默认拼接到相同的index 上

pd.concat([df1,df2],axis = 1)
Out[46]: 
       Country  Team Country   Team
a      Germany     A     NaN    NaN
data        UK     D     NaN    NaN
fake       USA     C     KLR  100.0
is       Japan     B     NaN    NaN
this     China     A     NaN    NaN
world      NaN   NaN     abc  200.0

不同columns 拼接：

创建一个不同列的df3:

col = ['Team','SBF']
idx_3= ['true','world']
values3 = [['red','pm'],['orange','pl']]
df3 = pd.DataFrame(values3,index = idx_3, columns = col)
df3
Out[51]: 
         Team SBF
true      red  pm
world  orange  pl

根据列名字做拼接，默认还是在列上拼接,相同列会拼接在一起

pd.concat([df1,df3])
       Country  SBF    Team
this     China  NaN       A
is       Japan  NaN       B
a      Germany  NaN       A
fake       USA  NaN       C
data        UK  NaN       D
true       NaN   pm     red
world      NaN   pl  orange

根据列名字做拼接，默认还是在列上拼接,相同列会拼接在一起，但是相同index的行不会在一起：

pd.concat([df2,df3])
Out[59]: 
      Country  SBF    Team
fake      KLR  NaN     100
world     abc  NaN     200
true      NaN   pm     red
world     NaN   pl  orange

当axis = 1时, index 相同的会拼接，columns 相同的不会，只是简单都左+右都放在一起

pd.concat([df2,df3],axis = 1)
Out[62]: 
      Country   Team    Team  SBF
fake      KLR  100.0     NaN  NaN
true      NaN    NaN     red   pm
world     abc  200.0  orange   pl

抽取其中的一列做拼接：

pd.concat([df1.Team,df2.Team,df3.Team])
Out[64]: 
this          A
is            B
a             A
fake          C
data          D
fake        100
world       200
true        red
world    orange
Name: Team, dtype: object

如果这样写会报错：

pd.concat(df1['Team'],df2['Team'],df3['Team'])
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"

pd.concat(df1[['Team']],df2[['Team']],df3[['Team']])
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

Pandas在做数据拼接的时候提供类似于数据库的内连接、外连接的操作。默认是outer join即外连接，可以使用参数指定连接的类型为内连接inner join(交集）。

pd.concat([df2,df3],join = 'inner')
Out[73]: 
         Team
fake      100
world     200
true      red
world  orange

默认的是join = ‘outer’:

pd.concat([df2,df3],join = 'outer')
pd.concat([df2,df3])
Out[74]: 
      Country  SBF    Team
fake      KLR  NaN     100
world     abc  NaN     200
true      NaN   pm     red
world     NaN   pl  orange

无视index的concat：如果两个表的index都没有实际含义，使用ignore_index参数，置true，合并的两个表就睡根据列字段对齐，然后合并。最后再重新整理一个新的index。

pd.concat([df2,df3], ignore_index = True)
Out[77]: 
  Country  SBF    Team
0     KLR  NaN     100
1     abc  NaN     200
2     NaN   pm     red
3     NaN   pl  orange