0 引言
Pandas对多个表格可以进行合并操作,主要分为纵向合并和横向合并。
1 纵向合并
import pandas as pd
import numpy as np
生成多个DataFrame表格
df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','d'])
df2 = pd.DataFrame(np.arange(12,24).reshape((3,4)),columns=['a','b','c','d'])
df3 = pd.DataFrame(np.arange(24,36).reshape((3,4)),columns=['a','b','c','d'])
print(df1)
print(df2)
print(df3)
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a b c d
0 12 13 14 15
1 16 17 18 19
2 20 21 22 23
a b c d
0 24 25 26 27
1 28 29 30 31
2 32 33 34 35
调用 .concat 函数,axis=0表示列,即进行纵向合并
# 纵向合并
df4 = pd.concat([df1,df2,df3],axis=0)
df4
a | b | c | d | |
---|---|---|---|---|
0 | 0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 | 7 |
2 | 8 | 9 | 10 | 11 |
0 | 12 | 13 | 14 | 15 |
1 | 16 | 17 | 18 | 19 |
2 | 20 | 21 | 22 | 23 |
0 | 24 | 25 | 26 | 27 |
1 | 28 | 29 | 30 | 31 |
2 | 32 | 33 | 34 | 35 |
.concat 函数有个参数可以不考虑原来的索引 index
df4 = pd.concat([df1,df2,df3],axis=0,ignore_index=True) # 不考虑原来的index
df4
a | b | c | d | |
---|---|---|---|---|
0 | 0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 | 7 |
2 | 8 | 9 | 10 | 11 |
3 | 12 | 13 | 14 | 15 |
4 | 16 | 17 | 18 | 19 |
5 | 20 | 21 | 22 | 23 |
6 | 24 | 25 | 26 | 27 |
7 | 28 | 29 | 30 | 31 |
8 | 32 | 33 | 34 | 35 |
2 横向合并
调用 .concat 函数,参数axis=1表示行,也就是横向合并
# 横向合并
df5 = pd.concat([df1,df2,df3],axis=1)
df5
a | b | c | d | a | b | c | d | a | b | c | d | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 12 | 13 | 14 | 15 | 24 | 25 | 26 | 27 |
1 | 4 | 5 | 6 | 7 | 16 | 17 | 18 | 19 | 28 | 29 | 30 | 31 |
2 | 8 | 9 | 10 | 11 | 20 | 21 | 22 | 23 | 32 | 33 | 34 | 35 |
df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','f'])
df2 = pd.DataFrame(np.arange(12,24).reshape((3,4)),columns=['a','c','d','e'])
print(df1)
print(df2)
a b c f
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a c d e
0 12 13 14 15
1 16 17 18 19
2 20 21 22 23
.concat 函数中参数join设置为’outer’ 并且 ignore_index设置为True可以把表格缺少部分填充为NaN
df6 = pd.concat([df1,df2],join='outer',ignore_index=True) # 合并两个表,缺少的部分填充 NaN
df6
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=True'.
To retain the current behavior and silence the warning, pass sort=False
"""Entry point for launching an IPython kernel.
a | b | c | d | e | f | |
---|---|---|---|---|---|---|
0 | 0 | 1.0 | 2 | NaN | NaN | 3.0 |
1 | 4 | 5.0 | 6 | NaN | NaN | 7.0 |
2 | 8 | 9.0 | 10 | NaN | NaN | 11.0 |
3 | 12 | NaN | 13 | 14.0 | 15.0 | NaN |
4 | 16 | NaN | 17 | 18.0 | 19.0 | NaN |
5 | 20 | NaN | 21 | 22.0 | 23.0 | NaN |
.concat 函数也可以实现只合并两个表相同索引 index 的部分
df6 = pd.concat([df1,df2],join='inner',ignore_index=True) # 只合并两个表相同的index部分,缺少的部分去掉
df6
a | c | |
---|---|---|
0 | 0 | 2 |
1 | 4 | 6 |
2 | 8 | 10 |
3 | 12 | 13 |
4 | 16 | 17 |
5 | 20 | 21 |
df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','f'])
df2 = pd.DataFrame(np.arange(12,24).reshape((4,3)),columns=['a','c','d'])
print(df1)
print(df2)
a b c f
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a c d
0 12 13 14
1 15 16 17
2 18 19 20
3 21 22 23
横向合并,并可以指定使用某个表格的索引 index
df8 = pd.concat([df1,df2],axis=1,join_axes=[df1.index]) # 横向合并,index使用df1的index
df8
a | b | c | f | a | c | d | |
---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 12 | 13 | 14 |
1 | 4 | 5 | 6 | 7 | 15 | 16 | 17 |
2 | 8 | 9 | 10 | 11 | 18 | 19 | 20 |
df8 = pd.concat([df1,df2],axis=1) # 横向合并
df8
a | b | c | f | a | c | d | |
---|---|---|---|---|---|---|---|
0 | 0.0 | 1.0 | 2.0 | 3.0 | 12 | 13 | 14 |
1 | 4.0 | 5.0 | 6.0 | 7.0 | 15 | 16 | 17 |
2 | 8.0 | 9.0 | 10.0 | 11.0 | 18 | 19 | 20 |
3 | NaN | NaN | NaN | NaN | 21 | 22 | 23 |