机器学习6:——Pandas——6:高级处理3:数据合并

一.高级处理-合并

学习目标

  • 目标
    • 应用pd.concat实现数据的合并
    • 应用pd.merge实现数据的合并
  • 应用

如果你的数据由多张表组成,那么有时候需要将不同的内容合并在一起分析**

1 pd.concat实现数据合并

  • pd.concat([data1, data2], axis=1)
    • 按照行或列进行合并,axis=0为列索引,axis=1为行索引

比如我们将刚才处理好的one-hot编码与原数据合并

在这里插入图片描述

# 按照行索引进行
pd.concat([data, dummies], axis=1)

2 pd.merge

  • pd.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None)
    • 可以指定按照两组数据的共同键值对合并或者左右各自
    • left: A DataFrame object
    • right: Another DataFrame object
    • on: Columns (names) to join on. Must be found in both the left and right DataFrame objects.
    • left_on=None, right_on=None:指定左右键
Merge methodSQL Join NameDescription
leftLEFT OUTER JOINUse keys from left frame only
rightRIGHT OUTER JOINUse keys from right frame only
outerFULL OUTER JOINUse union of keys from both frames
innerINNER JOINUse intersection of keys from both frames

2.1 pd.merge合并

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                        'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                        'key2': ['K0', 'K0', 'K0', 'K0'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']})

# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])

在这里插入图片描述

  • 左连接
result = pd.merge(left, right, how='left', on=['key1', 'key2'])

在这里插入图片描述

  • 右连接
result = pd.merge(left, right, how='right', on=['key1', 'key2'])

在这里插入图片描述

  • 外链接
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])

在这里插入图片描述

3 总结

  • pd.concat([数据1, 数据2], axis=**)【知道】
  • pd.merge(left, right, how=, on=)【知道】
    • how – 以何种方式连接
    • on – 连接的键的依据是哪几个

二.案例实现

数据合并

In [38]:

data.head()

Out[38]:

openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
2018-02-2723.5325.8824.1623.5395578.030.632.6822.94222.14222.87553782.6446738.6555576.112.39
2018-02-2622.8023.7823.5322.8060985.110.693.0222.40621.95522.94240827.5242736.3456007.501.53
2018-02-2322.8823.3722.8222.7152914.010.542.4221.93821.92923.02235119.5841871.9756372.851.32
2018-02-2222.2522.7622.2822.0236105.010.361.6421.44621.90923.13735397.5839904.7860149.600.90
2018-02-1421.4921.9921.9221.4823331.040.442.0521.36621.92323.25333590.2142935.7461716.110.58

In [39]:

data_dummies.head()

Out[39]:

best_28best_34best_35best_51best_57best_188best_215
(0, 3]0000001
(-3, 0]0000010
(3, 5]0000100
(-5, -3]0001000
(5, 7]0010000

In [50]:

data_con = pd.concat([data,data_dummies],axis = 1)
  • df.concat([df1, df2],axis=1):将df2中的列添加到df1的尾部

In [51]:

data_con

Out[51]:

openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma10v_ma20turnoverbest_28best_34best_35best_51best_57best_188best_215
(-100, -7]NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.01.00.00.00.00.00.0
(-7, -5]NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.00.00.00.00.00.00.0
(-5, -3]NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.00.01.00.00.00.0
(-3, 0]NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.00.00.00.01.00.0
(0, 3]NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.00.00.00.00.01.0
2018-02-1421.4921.9921.9221.4823331.040.442.0521.36621.92323.25342935.7461716.110.58NaNNaNNaNNaNNaNNaNNaN
2018-02-2222.2522.7622.2822.0236105.010.361.6421.44621.90923.13739904.7860149.600.90NaNNaNNaNNaNNaNNaNNaN
2018-02-2322.8823.3722.8222.7152914.010.542.4221.93821.92923.02241871.9756372.851.32NaNNaNNaNNaNNaNNaNNaN
2018-02-2622.8023.7823.5322.8060985.110.693.0222.40621.95522.94242736.3456007.501.53NaNNaNNaNNaNNaNNaNNaN
2018-02-2723.5325.8824.1623.5395578.030.632.6822.94222.14222.87546738.6555576.112.39NaNNaNNaNNaNNaNNaNNaN

651 rows × 21 columns

  • 用merge()内连接:看key值,如左表有0 1 0,右表也有0 1 0,才会保留该行,同时保留两表对应的值。
  • 如果左表中的key在右表中有两行,那么就保留两行的值。

In [52]:

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                        'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                        'key2': ['K0', 'K0', 'K0', 'K0'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']})

In [57]:

pd.merge(left,right,on = ['key1','key2'])

Out[57]:

key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
  • 1.内连接

In [58]:

pd.merge(left,right,on = ['key1','key2'],how = "inner")

Out[58]:

key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
  • 2.外连接

In [59]:

pd.merge(left,right,on = ['key1','key2'],how = "outer")

Out[59]:

key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
5K2K0NaNNaNC3D3
  • 3.左连接

In [60]:

pd.merge(left,right,on = ['key1','key2'],how = "left")

Out[60]:

key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
  • 4.右连接

In [61]:

pd.merge(left,right,on = ['key1','key2'],how = "right")

Out[61]:

key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
3K2K0NaNNaNC3D3
  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值