引语
记录学习路程,抛砖引玉。如有更好的算法或者出现错误,欢迎指点。
小结join
Exel表格合并同类项处理,离不开这些知识点(本人道听途说的哈)所以算是重点了吧
左连接(默认)
以左侧df1为基础,连接右侧df2中index相同的行
import pandas as pd
df1=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abc'))
df2=pd.DataFrame({'Blue':[1,9,8],'Yellow':[6,6,7]},index=list('cde'))
print(df1)
print(df2)
df3=df1.join(df2,how='left')
print(df3)
"""
Red Green
a 1 5
b 3 0
c 5 3
Blue Yellow
c 1 6
d 9 6
e 8 7
Red Green Blue Yellow
a 1 5 NaN NaN
b 3 0 NaN NaN
c 5 3 1.0 6.0
"""
右连接
以右侧df2为基础,连接左侧df1中index相同的行
import pandas as pd
df1=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abc'))
df2=pd.DataFrame({'Blue':[1,9,8],'Yellow':[6,6,7]},index=list('cde'))
print(df1)
print(df2)
df4 = df1.join(df2,how='right')
print(df4)
"""
Red Green
a 1 5
b 3 0
c 5 3
Blue Yellow
c 1 6
d 9 6
e 8 7
Red Green Blue Yellow
c 5.0 3.0 1 6
d NaN NaN 9 6
e NaN NaN 8 7
"""
外连接
左右两侧的index全连接,即df1和df2取并集
import pandas as pd
df1=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abc'))
df2=pd.DataFrame({'Blue':[1,9,8],'Yellow':[6,6,7]},index=list('cde'))
print(df1)
print(df2)
df5 = df1.join(df2,how='outer')
print(df5)
"""
Red Green
a 1 5
b 3 0
c 5 3
Blue Yellow
c 1 6
d 9 6
e 8 7
Red Green Blue Yellow
a 1.0 5.0 NaN NaN
b 3.0 0.0 NaN NaN
c 5.0 3.0 1.0 6.0
d NaN NaN 9.0 6.0
e NaN NaN 8.0 7.0
"""
复合连接
合并多个DataFrame对象
(没说明所以默认左连接,于是以左侧df2为基础,连接右侧df1和df6中index相同的行)
import pandas as pd
df1=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abc'))
df2=pd.DataFrame({'Blue':[1,9,8],'Yellow':[6,6,7]},index=list('cde'))
print(df1)
print(df2)
df6 = pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))
print(df6)
df7 = df2.join([df1,df6])
print(df7)
"""
Red Green
a 1 5
b 3 0
c 5 3
Blue Yellow
c 1 6
d 9 6
e 8 7
Brown White
a 3 1
e 4 1
d 5 2
Blue Yellow Red Green Brown White
c 1.0 6.0 5.0 3.0 NaN NaN
d 9.0 6.0 NaN NaN 5.0 2.0
e 8.0 7.0 NaN NaN 4.0 1.0
"""