总结,
concat和merge都是默认纵向拼接。
pd.concat默认取并集
并且索引保留原名称(即使会重复)
交集时保留相同的列,并集时不相同的列也保留,没有的index用Nan填充。
pd.merge默认取交集
交集时只保留index,column完全相同的。
并且索引会自动替换成从0开始的数字索引,不论是否重复。
可使用left_index,right_index=True来设置使用某一侧DataFrame中的索引作为连接键。纵向连接的时候设置两者都为True来保留原有index。
a=pd.DataFrame(np.arange(20).reshape(5,4))
a
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19b=pd.DataFrame(np.arange(20,40).reshape(5,4))
b
0 1 2 3
0 20 21 22 23
1 24 25 26 27
2 28 29 30 31
3 32 33 34 35
4 36 37 38 39pd.merge(a,b)
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []pd.merge(a,b,how=‘outer’)
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
8 32 33 34 35
9 36 37 38 39pd.merge(a,b,how=‘inner’)
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []pd.concat([a,b],how=‘inner’)
Traceback (most recent call last):
File “”, line 1, in
TypeError: concat() got an unexpected keyword argument ‘how’pd.concat([a,b],join=‘outer’)
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
0 20 21 22 23
1 24 25 26 27
2 28 29 30 31
3 32 33 34 35
4 36 37 38 39pd.concat([a,b],join=‘inner’)
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
0 20 21 22 23
1 24 25 26 27
2 28 29 30 31
3 32 33 34 35
4 36 37 38 39b.columns=[2,3,4,5]
pd.concat([a,b])
0 1 2 3 4 5
0 0.0 1.0 2 3 NaN NaN
1 4.0 5.0 6 7 NaN NaN
2 8.0 9.0 10 11 NaN NaN
3 12.0 13.0 14 15 NaN NaN
4 16.0 17.0 18 19 NaN NaN
0 NaN NaN 20 21 22.0 23.0
1 NaN NaN 24 25 26.0 27.0
2 NaN NaN 28 29 30.0 31.0
3 NaN NaN 32 33 34.0 35.0
4 NaN NaN 36 37 38.0 39.0pd.concat([a,b],join=‘inner’)
2 3
0 2 3
1 6 7
2 10 11
3 14 15
4 18 19
0 20 21
1 24 25
2 28 29
3 32 33
4 36 37pd.merge(a,b)
Empty DataFrame
Columns: [0, 1, 2, 3, 4, 5]
Index: []pd.merge(a,b,how=‘outer’)
0 1 2 3 4 5
0 0.0 1.0 2 3 NaN NaN
1 4.0 5.0 6 7 NaN NaN
2 8.0 9.0 10 11 NaN NaN
3 12.0 13.0 14 15 NaN NaN
4 16.0 17.0 18 19 NaN NaN
5 NaN NaN 20 21 22.0 23.0
6 NaN NaN 24 25 26.0 27.0
7 NaN NaN 28 29 30.0 31.0
8 NaN NaN 32 33 34.0 35.0
9 NaN NaN 36 37 38.0 39.0
a.index=[0,1,2,3,4]
a
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19pd.merge(a,b,how=‘outer’)
0 1 2 3 4 5
0 0.0 1.0 2 3 NaN NaN
1 4.0 5.0 6 7 NaN NaN
2 8.0 9.0 10 11 NaN NaN
3 12.0 13.0 14 15 NaN NaN
4 16.0 17.0 18 19 NaN NaN
5 NaN NaN 20 21 22.0 23.0
6 NaN NaN 24 25 26.0 27.0
7 NaN NaN 28 29 30.0 31.0
8 NaN NaN 32 33 34.0 35.0
9 NaN NaN 36 37 38.0 39.0pd.merge(a,b,how=‘outer’,left_index=True)
Traceback (most recent call last):
File “”, line 1, in
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 86, in merge
validate=validate,
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 620, in init
self._validate_specification()
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 1183, in _validate_specification
raise MergeError(“Must pass right_on or right_index=True”)
pandas.errors.MergeError: Must pass right_on or right_index=Truepd.merge(a,b,how=‘outer’,right_index=True)
Traceback (most recent call last):
File “”, line 1, in
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 86, in merge
validate=validate,
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 620, in init
self._validate_specification()
File “/share/data1/fengjie/software/miniconda3/envs/RGI_new/lib/python3.6/site-packages/pandas/core/reshape/merge.py”, line 1186, in _validate_specification
raise MergeError(“Must pass left_on or left_index=True”)
pandas.errors.MergeError: Must pass left_on or left_index=Truepd.merge(a,b,how=‘outer’,left_index=True,right_index=True)
0 1 2_x 3_x 2_y 3_y 4 5
0 0.0 1.0 2.0 3.0 NaN NaN NaN NaN
1 4.0 5.0 6.0 7.0 NaN NaN NaN NaN
2 8.0 9.0 10.0 11.0 NaN NaN NaN NaN
3 12.0 13.0 14.0 15.0 NaN NaN NaN NaN
4 16.0 17.0 18.0 19.0 NaN NaN NaN NaN
e NaN NaN NaN NaN 28.0 29.0 30.0 31.0
q NaN NaN NaN NaN 20.0 21.0 22.0 23.0
r NaN NaN NaN NaN 32.0 33.0 34.0 35.0
t NaN NaN NaN NaN 36.0 37.0 38.0 39.0
w NaN NaN NaN NaN 24.0 25.0 26.0 27.0pd.merge(a,b,how=‘inner’,left_index=True,right_index=True)
Empty DataFrame
Columns: [0, 1, 2_x, 3_x, 2_y, 3_y, 4, 5]
Index: []pd.merge(a,a)
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19