合并数据集
数据库的合并(merage)与连接(join)运算是通过一个或多个键将行链接起来的
- 默认情况下,merage做的是"inner"连接,结果中的键是交集
df1 = DataFrame({'key' : ['b','b','a','c','a','a','b'],'data1' : range(
...: 7)})
df2 = DataFrame({'key' : ['a','b','d'],'data2' : range(3)})
pd.merge(df1,df2)
df2 = DataFrame({'key' : ['a','b','c'],'data2' : range(3)})
pd.merge(df1,df2)
- 如果没有指明用哪个列进行连接,merge就会将重叠的列名当作键
df1 = DataFrame({'key' : ['b','b','a','c','a','a','b'],'data' : range(7
...: )})
df2 = DataFrame({'key' : ['a','b','c'],'data' : range(3)})
pd.merge(df1,df2,on = 'data')
pd.merge(df1,df2)
df1 = DataFrame({'key' : ['b','b','a','c','a','a','b'],'data1' : range(
...: 7)})
df2 = DataFrame({'key' : ['a','b','c'],'data2' : range(3)})
pd.merge(df1,df2,left_on = 'data1',right_on = 'data2')
- merge还有其他连接方式,如 ‘left’.‘right’.‘outer’
df2 = DataFrame({'key' : ['a','b','d'],'data2' : range(3)})
pd.merge(df1,df2,how = 'outer')
pd.merge(df1,df2,how = 'left')
pd.merge(df1,df2,how = 'left')
- 前面讲的都是df2每个元素只有一个的情况,当多对多连接时,进行的是行的笛卡尔积
df1 = DataFrame({'key' : ['b','b','a','c','a','b'],'data1':range(6)})
df2 = DataFrame({'key':['a','b','a','b','d'],'data2':range(5)})
pd.merge(df1,df2,on='key',how = 'left')