数组信息查看,注释段import pandas as pd
import numpy as np
salaries=pd.DataFrame({
'name':['BOSS','Lilei','Lilei','Han','BOSS','BOSS','Han','BOSS'],
'Year':[2016,2016,2016,2016,2017,2017,2017,2017],
'Salary':[999999,20000,25000,3000,9999999,999999,3500,999999],
'Bonus':[100000,20000,20000,5000,200000,300000,3000,400000]
})print(salaries.columns)#Index(['Bonus', 'Salary', 'Year', 'name'],dtype='object')print(salaries.info())
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
Bonus 8 non-null int64
Salary 8 non-null int64
Year 8 non-null int64
name 8 non-null object
dtypes: int64(3), object(1)
memory usage: 336.0+ bytes
Noneprint(salaries.describe())Bonus Salary Year
count 8.000000 8.000000e+00 8.000000
mean 131000.000000 1.631437e+06 2016.500000
std 152851.935826 3.416521e+06 0.534522
min 3000.000000 3.000000e+03 2016.000000
25% 16250.000000 1.587500e+04 2016.000000
50% 60000.000000 5.124995e+05 2016.500000
75% 225000.000000 9.999990e+05 2017.000000
max 400000.000000 9.999999e+06 2017.000000salaries=salaries[['name','Year','Salary','Bonus']]
#字典无序,规定下columns的顺序
print(salaries)name Year Salary Bonus
0 BOSS 2016 999999 100000
1 Lilei 2016 20000 20000
2 Lilei 2016 25000 20000
3 Han 2016 3000 5000
4 BOSS 2017 9999999 200000
5 BOSS 2017 999999 300000
6 Han 2017 3500 3000
7 BOSS 2017 999999 400000
Group by分组group_by_name=salaries.groupby('name')
print(type(group_by_name))
查看group_by_name的组成groups方法print(group_by_name.groups) ###groups方法
print(len(group_by_name.groups)){'Han': Int64Index([3, 6], dtype='int64'),
'BOSS': Int64Index([0, 4, 5, 7], dtype='int64'),
'Lilei': Int64Index([1, 2], dtype='int64')}
3
查看group分组情况for name,group in group_by_name:
print(name)
print(group)BOSS
name Year Salary Bonus
0 BOSS 2016 999999 100000
4 BOSS 2017 9999999 200000
5 BOSS 2017 999999 300000
7 BOSS 2017 999999 400000
Han
name Year Salary Bonus
3 Han 2016 3000 5000
6 Han 2017 3500 3000
Lilei
name Year Salary Bonus
1 Lilei 2016 20000 20000
2 Lilei 2016 25000 20000
选择group分组print(group_by_name.get_group('Lilei'))name Year Salary Bonus
1 Lilei 2016 20000 20000
2 Lilei 2016 25000 20000print(group_by_name.get_group('BOSS'))name Year Salary Bonus
0 BOSS 2016 999999 100000
4 BOSS 2017 9999999 200000
5 BOSS 2017 999999 300000
7 BOSS 2017 999999 400000
1.)按一个columns分组1.a)按一个columns分组后,对其余各columns做一种统计运算print(group_by_name.sum()) #相同name求和Year Salary Bonus
name
BOSS 8067 12999996 1000000
Han 4033 6500 8000
Lilei 4032 45000 40000print(group_by_name[['Salary','Bonus']].sum())Salary Bonus
name
BOSS 12999996 1000000