数据分析第三篇——Pandas之数据的合并

3.8 高级处理—合并

  • 3.8.1 pd.concat()实现合并——按方向
  • 3.8.2 pd.merge()实现合并——按索引

如果数据由多张表组成,那么有时需要将不同的内容合并在一起分析

3.8.1 pd.concat()实现合并(按方向)

  • pd.concat([data1, data2], axis=0)
    • 按行或列索引进行合并,axis=1为水平拼接(按列索引的方向),axis默认为0(按行索引的方向,竖直拼接)

比如将刚才处理好的one-hot编码与原数据合并

stock.head()
trade_datecloseopenhighlowpre_closechangepct_chgvolamount
0202003132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436.0393019665.2
1202003122923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457.0328209202.4
2202003112968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970.0378766619.0
3202003102996.76182918.93473000.29632904.79892943.290753.47111.8167393296648.0425017184.8
4202003092943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736.0438143854.6
stock_change.head()
涨跌幅_(-600, -300]涨跌幅_(-300, 0]涨跌幅_(0, 300]涨跌幅_(300, 600]涨跌幅_(600, 900]
001000
101000
201000
300100
401000
# 沿列索引的方向水平拼接
pd.concat([stock_change, stock], axis=1)
涨跌幅_(-600, -300]涨跌幅_(-300, 0]涨跌幅_(0, 300]涨跌幅_(300, 600]涨跌幅_(600, 900]trade_datecloseopenhighlowpre_closechangepct_chgvolamount
001000202003132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436.03.930197e+08
101000202003122923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457.03.282092e+08
201000202003112968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970.03.787666e+08
300100202003102996.76182918.93473000.29632904.79892943.290753.47111.8167393296648.04.250172e+08
401000202003092943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736.04.381439e+08
................................................
69970100019910719136.7000137.6600138.5400136.6600137.1700-0.4700-0.342610823.05.242826e+03
69980010019910718137.1700137.1700137.1700135.8100135.81001.36001.0014847.04.644160e+02
69990010019910717135.8100135.8100135.8100135.3900134.47001.34000.9965660.03.975240e+02
70000010019910716134.4700134.3900134.4700133.1400133.14001.33000.99892796.01.328502e+03
70010010019910715133.1400133.9000134.1000131.8700132.80000.34000.256011938.05.534900e+03

7002 rows × 15 columns

# 沿行索引的方向数值拼接
pd.concat([stock, stock_change], axis=0)
trade_datecloseopenhighlowpre_closechangepct_chgvolamount涨跌幅_(-600, -300]涨跌幅_(-300, 0]涨跌幅_(0, 300]涨跌幅_(300, 600]涨跌幅_(600, 900]
020200313.02887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436.0393019665.2NaNNaNNaNNaNNaN
120200312.02923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457.0328209202.4NaNNaNNaNNaNNaN
220200311.02968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970.0378766619.0NaNNaNNaNNaNNaN
320200310.02996.76182918.93473000.29632904.79892943.290753.47111.8167393296648.0425017184.8NaNNaNNaNNaNNaN
420200309.02943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736.0438143854.6NaNNaNNaNNaNNaN
................................................
6997NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.01.00.00.00.0
6998NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.01.00.00.0
6999NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.01.00.00.0
7000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.01.00.00.0
7001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.00.01.00.00.0

14004 rows × 15 columns

上例中,不同的字段在合并后的表中都出现,不存在值的字段处得值为NaN

3.8.2 pd.merge()实现合并(按索引)

  • pd.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=(’_x’,‘y’), copy=True, indicator=False, validate=None)
    • 可以指定按照两组数据的共同键值对合并或左右各自
    • left:A DataFrame object
    • right: Another DataFrame object
    • how: 默认为inner
    • on: [索引]——指定依据哪个索引合并
    • left_on=None, right_on=None: 指定左右键

how = ‘inner’:内连接,只保留键的值相同的行
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vRq588Ni-1586742394676)(attachment:image.png)]

how = ‘left’: 以左边的表为主连接
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5VnX61g8-1586742394678)(attachment:image.png)]

how = ‘outer’: 外连接,只要键的值存在,相应的行就会被保留
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-w94Lif3l-1586742394678)(attachment:image.png)]

left = pd.DataFrame({'key1':['K0', 'K0', 'K1', 'K2'],
                     'key2':['K0', 'K1', 'K0', 'K1'],
                     'A'   :['A0', 'A1', 'A2', 'A3'],
                     'B'   :['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1':['K0', 'K1', 'K1', 'K2'],
                      'key2':['K0', 'K0', 'K0', 'K0'],
                      'C'   :['C0', 'C1', 'C2', 'C3'],
                      'D'   :['D0', 'D1', 'D2', 'D3']})
left
key1key2AB
0K0K0A0B0
1K0K1A1B1
2K1K0A2B2
3K2K1A3B3
right
key1key2CD
0K0K0C0D0
1K1K0C1D1
2K1K0C2D2
3K2K0C3D3
# how='nner'内连接
# 只保留键的值相同的行,两个DataFrame的字段都出现
pd.merge(left, right, how='inner', on=['key1', 'key2'])
key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
# how='left'左连接
# 左表中的所有行一定出现,右表中键与坐标不同的行的内容显示为NaN
pd.merge(left, right, how='left', on=['key1', 'key2'])
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
# how='outer'外连接
# 只要键的值存在,其相应的行就会被保留
pd.merge(left, right, how='outer', on=['key1', 'key2'])
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
5K2K0NaNNaNC3D3
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值