python对数据进行处理分析

数据规整化:合并

  • 进行多个键的合并
left = DataFrame({'key1' : ['foo','foo','bar'],'key2' :['one','two','on
    ...: e'],'lval' : [1,2,3]})   
    
right = DataFrame({'key1' : ['foo','foo','bar','bar'],'key2' : ['one',
     ...: 'one','one','two'],'rval' : [4,5,6,7]})  

pd.merge(left,right,on = ['key1','key2'],how = 'outer')
  • 对重复列名的处理
pd.merge(left,right,on = 'key1') 

pd.merge(left,right,on = 'key1',suffixes = ('_left','_right'))  
  • 索引上的合并
left = DataFrame({'key' :['a','b','a','a','b','c'],'value' : range(6)}
     ...: )        
   
 right1 = DataFrame({'group_val' : [3.5,7]},index = ['a','b'])

pd.merge(left1,right1,left_on = 'key',right_index = True)  

pd.merge(left1,right1,left_on = 'key',right_index = True,how = 'outer'
     ...: )                                                                     
right3 = DataFrame({'group_val' : [3.5,7,8]},index = ['a','b','c'])  

pd.merge(left1,right3,left_on = 'key',right_index = True) 
  • 对于层次化索引的数据:
lefth = DataFrame({'key1' : ['Ohio','Ohio','Ohio','Nevada','Nevada'], 
     ...: 'key2' : [2000,2001,2002,2001,2002],'data' : np.arange(5.)}) 

 righth = DataFrame(np.arange(12).reshape((6,2)),index = [['Nevada','Ne
     ...: vada','Ohio','Ohio','Ohio','Ohio'],[2001,2000,2000,2000,2001,2002]],co
     ...: lumns = ['event1','event2']) 

pd.merge(lefth,righth,left_on = ['key1','key2'],right_index = True)   

      错误示例:
pd.merge(lefth,righth,left_on ='key1,right_index = True) 

最后,我们尝试一下同时合并双方的索引:

left2 = DataFrame([[1.,2.],[3.,4.],[5.,6.]],index = ['a','c','e'],colu
     ...: mns = ['Ohio','Nevada'])  

 right2 = DataFrame([[7.,8.],[9.,10.],[11.,12.],[13,14]],index = ['b','
     ...: c','d','e'],columns = ['Missourri','Alabama']) 

pd.merge(left2,right2,how = 'outer',left_index = True,right_index = Tr
     ...: ue) 

DataFrame还有一个join实例方法,可以更方便的实现按索引合并:

left2.join(right2,how = 'outer')

当然,还可以join还支持参数DataFrame的索引跟调用者DataFrame的某个列之间的连接:

right1 = DataFrame({'group_val' : [3.5,7]},index = ['a','b'])     

left1 = DataFrame({'key' :['a','b','a','a','b','c'],'value' : range(6)
     ...: }) 

left1.join(right1,on = 'key',how = 'inner')  

同时,还可以向join传入一组Dataframe,进行多组连接:

left2 = DataFrame([[1.,2.],[3.,4.],[5.,6.]],index = ['a','c','e'],colu
     ...: mns = ['Ohio','Nevada'])    

right2 = DataFrame([[7.,8.],[9.,10.],[11.,12.],[13,14]],index = ['b','
     ...: c','d','e'],columns = ['Missourri','Alabama']) 

another = DataFrame([[7.,8.],[9.,10.],[11.,12.],[16.,17.]],index = ['a
     ...: ','c','e','f'],columns = ['New','Oregon'])

left2.join([right2,another])   
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

聆一

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值