分组聚合–主要有以下知识点:
1、描述性统计知识 ,如min() 最小值,max() 最大值,median() --中位数,mead() 均值,
quantitle 分位数,quantitle(0.1,0,2,0.5,0.8)分别表示 十分位数,二十分位数,
五十分位数--又叫中位数,八十分位数等
2、分组运算方法--groupby ,类似于SQL中的groupby方法
3、聚合方法--egg,apply和transfrom等
实例如下:
os.chdir('数据存储路径')
sales=pd.read_csv('app.csv',dtype=['year':float])
—将year这一列下的数据读取成浮点型数据
选取想要的列名(因为数据很多很多时我们选取想要的变量)
import pandas as pd
import numpy as np
import os
list=[['2000','89','24','34','78','LOL','900'],
['2001','44','34','343','34','LOL','487'],
['2008','22','333','34','66','CS','868'],
['2010','322','434','342','676','CS','988'],
['2018','356','445','666','777','VB','777']]
list
[[‘2000’, ‘89’, ‘24’, ‘34’, ‘78’, ‘LOL’, ‘900’],
[‘2001’, ‘44’, ‘34’, ‘343’, ‘34’, ‘LOL’, ‘487’],
[‘2008’, ‘22’, ‘333’, ‘34’, ‘66’, ‘CS’, ‘868’],
[‘2010’, ‘322’, ‘434’, ‘342’, ‘676’, ‘CS’, ‘988’],
[‘2018’, ‘356’, ‘445’, ‘666’, ‘777’, ‘VB’, ‘777’]]
os.chdir('C:\data')
sales=pd.DataFrame(list,columns=['year', 'n_sale', 's_sale',
'china_sale', 'e_sale', 'name', 'money'])
sales
year n_sale s_sale china_sale e_sale name money
0 2000 89 24 34 78