一、concat
concat基本作用就是可以将header相同的sheet表进行合并
默认合并方式是竖直方向合并,axis参数为1的话是水平方向进行合并。
合并的时候注意是根据header合并还是相同的index合并
import pandas as pd
n=pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},index=['a','b'])
s=pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},index=['c','d'])
out = pd.concat([n, s])
print(out)
n=pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},index=['a','b'])
s=pd.DataFrame({'C':['A0','A1'],'D':['B0','B1']},index=['a','b'])
out = pd.concat([n, s], axis=1)
print(out)
默认竖直方向(axis=0)
A B
a A0 B0
b A1 B1
c A0 B0
d A1 B1
axis=1为水平方向
A B C D
a A0 B0 A0 B0
b A1 B1 A1 B1
join可以选择inner或outer来决定取交集还是并集。
n=pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},index=['a','b'])
s=pd.DataFrame({'C':['A0','A1'],'D':['B0','B1']},index=['b','d'])
out = pd.concat([n, s], join='outer', axis=1)
print(out)
out = pd.concat([n, s], join='inner', axis=1)
print(out)
A B C D
a A0 B0 NaN NaN
b A1 B1 A0 B0
d NaN NaN A1 B1
A B C D
b A1 B1 A0 B0
二、append
该函数和conca差不多。算是简化版的concat。默认竖直方向。
import pandas as pd
n=pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']}, index=['a','b'])
s=pd.DataFrame({'A':['C0','C1'],'B':['D0','D1']}, index=['c','d'])
out = n.append(s)
print(out)
A B
a A0 B0
b A1 B1
c C0 D0
d C1 D1
三、merge
这里比较重要的两个参数是how和on(这里是以header为链接键)。
若以index为链接键,需要同时设置left_index= True 和 right_index= True,或者left_index设置的同时,right_on指定某个Key。总的来说就是需要指定left、right链接的键,可以同时是key、index或者混合使用。
on、how
import pandas as pd
n = pd.DataFrame({'key': list('abcdef'), 'data1': list('qwerty')})
s = pd.DataFrame({'key': ['a', 'b', 'c'], 'data2': ['s', 'ss', 'sss']})
# how 的参数可以为 left right outer inner
out = pd.merge(n, s, on='key',how='left')
print(out)
key data1 data2
0 a q s
1 b w ss
2 c e sss
3 d r NaN
4 e t NaN
5 f y NaN
left_index、right_index
import pandas as pd
n = pd.DataFrame({'key': list('abcdef'), 'data1': list('qwerty')},index=['1','2','3','4','5','6'])
s = pd.DataFrame({'key': ['a', 'b', 'c'], 'data2': ['s', 'ss', 'sss']}, index=['1','3','5'])
# how 的参数可以为 left right outer inner
out = pd.merge(n, s, how='inner',left_index=True,right_index=True)
print(out)
out = pd.merge(n, s, how='outer',left_index=True,right_index=True)
print(out)
key_x data1 key_y data2
1 a q a s
3 c e b ss
5 e t c sss
key_x data1 key_y data2
1 a q a s
2 b w NaN NaN
3 c e b ss
4 d r NaN NaN
5 e t c sss
6 f y NaN NaN
index、on
import pandas as pd
n = pd.DataFrame({'key1': list('123456'), 'data1': list('qwerty')})
s = pd.DataFrame({'key2': ['a', 'b', 'c'], 'data2': ['s', 'ss', 'sss']}, index=['1','3','5'])
# how 的参数可以为 left right outer inner
out = pd.merge(n, s, how='outer',left_on='key1',right_index=True)
print(out)
key1 data1 key2 data2
0 1 q a s
1 2 w NaN NaN
2 3 e b ss
3 4 r NaN NaN
4 5 t c sss
5 6 y NaN NaN
四、join
分为 left、right、inner、outer。
import pandas as pd
n=pd.DataFrame({'A':['A0','A1'], 'B':['B0','B1'], 'E': ['C0','C1'] }, index=['a','c'])
s=pd.DataFrame({'C':['C0','C1'],'D':['D0','D1']},index=['c','d'])
# how 的参数可以为 left right outer inner
out = n.join(s, how='right')
print(out)
out = n.join(s, how='left')
print(out)
out = n.join(s, how='inner')
print(out)
out = n.join(s, how='outer')
print(out)
A B E C D
c A1 B1 C1 C0 D0
d NaN NaN NaN C1 D1
A B E C D
a A0 B0 C0 NaN NaN
c A1 B1 C1 C0 D0
A B E C D
c A1 B1 C1 C0 D0
A B E C D
a A0 B0 C0 NaN NaN
c A1 B1 C1 C0 D0
d NaN NaN NaN C1 D1