Pandas怎样实现DataFrame的Merge
Pandas的Merge,相当于Sql的Join,将不同的表按key关联到一个表
merge的语法:
pd.merge(left,right, how='nner , on=None,left_on=None, right_on=None,lef_index=False,right_index=False,sort=True, sufises=(‘_X’ ,“_Y” ), opy=Tue,indicator=False, validate=None)
left,right:要merge的dataframe或者有name的Series.
how: join类型,'left', 'right', 'outer', 'inner'
on: join的key,left和right都需要有这个key.
left_on: left的df或者series的key
right_on: right的df或者seires的key
left_index,right_index:使用index而不是普通的column做join
suffixes:两个元素的后缀,如果列有重名,自动添加后缀,默认是('_X','_y')
本次讲解提纲:
1.电影数据集的join实例
2.理解merge时一对一、一对多、多对多的数量对齐关系
3.理解left join、right join、inner join、outer join的区别
4.如果出现非Key的字段重名怎么办
import pandas as pd
df_ratings=pd.read_csv(
'./datas/movielens-1m/ratings.dat',
sep="::",#设置其分隔服为::
engine='python',#因为pandas语法中sep为两个字符的时候,系统默认其为正则表达式,但是分隔符就是::不是正则表达式,所以加engine=python
names='UserID::MovieID::Rating::Timestamp'.split('::')
)
df_ratings.head()
df_users=pd.read_csv(
'./datas/movielens-1m/users.dat',
sep="::",
engine='python',
names='UserID::Gender::Age::Occupation::Zip-code'.split('::')
)
df_users.head()
df_movies=pd.read_csv(
'./datas/movielens-1m/movies.dat',
sep='::',
engine='python',
names='MovieID::Tile::Genres'.split("::")
)
df_movies.head()
df_ratings_users=pd.merge(
df_ratings,df_users,on='UserID',how='inner'
)
df_ratings_users.head()
df_ratings_users_movies=pd.merge(
df_ratings_users,df_movies,left_on='MovieID',right_on='MovieID',how='inner'
)
df_ratings_users_movies.head()
2、理解merge时数量的对齐关系以下关系要正确理解:
. one-to-one:—对一关系,关联的key都是唯一的
.比如(学号,姓名) merge (学号,年龄)
-结果条数为:1*1
. one-to-many:一对多关系,左边唯一key,右边不唯一key
-比如(学号,姓名) merge(学号,[语文成绩、数学成绩、英语成绩])
结果条数为:1*N
. many-to-many : [多对多关系,左边右边都不是唯一的
·比如(学号,[语文成绩、数学成绩、英语成绩])merge (学号,[篮球、足球、乒乓球)
-结果条数为:M*N
2.1 one-to-one 一对一关系的merge
left=pd.DataFrame({
'sno':[11,12,13,14],
'name':['name_a','name_b','name_c','name_d']
}
)
left
right=pd.DataFrame({
'sno':[11,12,13,14],
'age':['21','22','23','24']
})
right
#一对一关系中:结果有四条
pd.merge(left,right,on='sno')
2.2 one-to-many一对多关系的
merge注意:数据会被制
left=pd.DataFrame({
'sno':[11,12,13,14],
'name':['name_a','name_b','name_c','name_d']
}
)
left
right=pd.DataFrame({
'sno':[11,11,11,12,12,13],
'grade':['英语:21','语文:22','数学:23','英语:24',"语文:11","数学:55"]
})
right
#一对多的关系对name名字进行了复制
pd.merge(left,right,on='sno')
2.3 many-to-many多对多关系的merge
注意:结果数量会出现乘法
left=pd.DataFrame({
'sno':[11,11,12,12,12],
'爱好':['篮球','羽毛球','乒乓球','篮球','足球']
}
)
left
right=pd.DataFrame({
'sno':[11,11,11,12,12,13],
'grade':['英语:21','语文:22','数学:23','英语:24',"语文:11","数学:55"]
})
right
pd.merge(left,right,on='sno')
3、理解left join、right join、inner join、outer join的区别
left=pd.DataFrame({
'key':['ko','k1','k2','k3'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']
})
right=pd.DataFrame({
'key':['k0','k1','k2','k3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']
})
left
3.1 inner join,默认
左边和右边的key都有,才会出现在结果里
pd.merge(left,right,how='inner')
3.2 left join
左边的都会出现在结果里,右边的如果无法匹配则为Null
pd.merge(left,right,how='left')
3.3 right join
右边的都会出现在结果里,左边的如果无法匹配则为Nul
pd.merge(left,right,how='right')
3.4 outer join
左边、右边的都会出现在结果里,如果无法匹配则为Null
pd.merge(left,right,how='outer')
4、如果出现非Key的字段重名怎么办
left=pd.DataFrame({
'key':['K0','k1','k2','k3'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']
})
right=pd.DataFrame({
'key':['K0','K1','K4','K5'],
'A':['A10','A11','A12','A13'],
'D':['D0','D1','D4','D5']
})
left
pd.merge(left,right,on='key')
suffixes=('name1','name2')可以直接设置重名函数的函数名
pd.merge(left,right,on='key',suffixes=('_left','_right'))
#suffixes=('name1','name2')可以直接设置重名函数的函数名
pd.merge(left,right,on='key',suffixes=('_left','_right'))