dataframe 横向拼接_Python3pandas库DataFrame的分组,拼接,统计运算等用法(基础整理)

数组信息查看,注释段import pandas as pd

import numpy as np

salaries=pd.DataFrame({

'name':['BOSS','Lilei','Lilei','Han','BOSS','BOSS','Han','BOSS'],

'Year':[2016,2016,2016,2016,2017,2017,2017,2017],

'Salary':[999999,20000,25000,3000,9999999,999999,3500,999999],

'Bonus':[100000,20000,20000,5000,200000,300000,3000,400000]

})print(salaries.columns)#Index(['Bonus', 'Salary', 'Year', 'name'],dtype='object')print(salaries.info())

RangeIndex: 8 entries, 0 to 7

Data columns (total 4 columns):

Bonus 8 non-null int64

Salary 8 non-null int64

Year 8 non-null int64

name 8 non-null object

dtypes: int64(3), object(1)

memory usage: 336.0+ bytes

Noneprint(salaries.describe())Bonus Salary Year

count 8.000000 8.000000e+00 8.000000

mean 131000.000000 1.631437e+06 2016.500000

std 152851.935826 3.416521e+06 0.534522

min 3000.000000 3.000000e+03 2016.000000

25% 16250.000000 1.587500e+04 2016.000000

50% 60000.000000 5.124995e+05 2016.500000

75% 225000.000000 9.999990e+05 2017.000000

max 400000.000000 9.999999e+06 2017.000000salaries=salaries[['name','Year','Salary','Bonus']]

#字典无序,规定下columns的顺序

print(salaries)name Year Salary Bonus

0 BOSS 2016 999999 100000

1 Lilei 2016 20000 20000

2 Lilei 2016 25000 20000

3 Han 2016 3000 5000

4 BOSS 2017 9999999 200000

5 BOSS 2017 999999 300000

6 Han 2017 3500 3000

7 BOSS 2017 999999 400000

Group by分组group_by_name=salaries.groupby('name')

print(type(group_by_name))

查看group_by_name的组成groups方法print(group_by_name.groups) ###groups方法

print(len(group_by_name.groups)){'Han': Int64Index([3, 6], dtype='int64'),

'BOSS': Int64Index([0, 4, 5, 7], dtype='int64'),

'Lilei': Int64Index([1, 2], dtype='int64')}

3

查看group分组情况for name,group in group_by_name:

print(name)

print(group)BOSS

name Year Salary Bonus

0 BOSS 2016 999999 100000

4 BOSS 2017 9999999 200000

5 BOSS 2017 999999 300000

7 BOSS 2017 999999 400000

Han

name Year Salary Bonus

3 Han 2016 3000 5000

6 Han 2017 3500 3000

Lilei

name Year Salary Bonus

1 Lilei 2016 20000 20000

2 Lilei 2016 25000 20000

选择group分组print(group_by_name.get_group('Lilei'))name Year Salary Bonus

1 Lilei 2016 20000 20000

2 Lilei 2016 25000 20000print(group_by_name.get_group('BOSS'))name Year Salary Bonus

0 BOSS 2016 999999 100000

4 BOSS 2017 9999999 200000

5 BOSS 2017 999999 300000

7 BOSS 2017 999999 400000

1.)按一个columns分组1.a)按一个columns分组后,对其余各columns做一种统计运算print(group_by_name.sum()) #相同name求和Year Salary Bonus

name

BOSS 8067 12999996 1000000

Han 4033 6500 8000

Lilei 4032 45000 40000print(group_by_name[['Salary','Bonus']].sum())Salary Bonus

name

BOSS 12999996 1000000

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值