Datawhale Task02 pandas基础 打卡

小白表示收获很大。一次性摄入过多,练习题我缓缓再写。ps资料的编写者真的好厉害,膜拜。

import numpy as np
import pandas as pd 
pd.__version__
'1.1.5'

2.1 文件的读取和写入

2.1.1 文件读取

df_csv = pd.read_csv(r'D:\Ajupyter\iris.csv'
                     ,usecols = ['Sepal_Length','Sepal_Width']
                     ,index_col = ['Sepal_Length']
                    )
df_csv
Sepal_Width
Sepal_Length
5.13.5
4.93.0
4.73.2
4.63.1
5.03.6
......
6.73.0
6.32.5
6.53.0
6.23.4
5.93.0

150 rows × 1 columns

df_txt = pd.read_csv(r'C:\Users\86198\Desktop\数据\平台查询的企业.txt')
df_txt
企业名
0北京爱钱帮财富科技有限公司
1烟台艾利互金网络信息服务有限公司
2成都伟品信息技术服务有限公司
3北京朴素磐石投资管理有限公司
4宝蓝财富科技有限公司
......
126北京中金丰联信息技术股份有限公司
127广州中青金服互联网金融信息服务有限公司
128上海顽色投资管理有限公司
129杭州上陈金融服务外包有限公司
130杭州飞牛科技有限公司

131 rows × 1 columns

df_excel = pd.read_excel(r'C:\Users\86198\Desktop\数据\批量查询_859.xls',parse_dates = ['成立日期'],nrows = 3)
df_excel
企业名称登记状态法定代表人注册资本成立日期核准日期所属省份所属城市所属区县电话...注册号组织机构代码参保人数企业类型所属行业曾用名网址企业地址最新年报地址经营范围
0宝蓝财富(天津)科技有限公司存续胡德荣5500万元人民币2014-04-042018-04-04天津市天津市滨海新区022-23757986...12019300008739509366780-743有限责任公司科技推广和应用服务业-http://www.batiaoyu.com天津滨海高新区华苑产业区(环外)海泰创新六路2号3-2-601天津市河西区解放南路与浯水道交口喜年广场5-201软件技术开发、咨询、服务、转让;商务信息咨询;计算机系统集成;财务咨询;企业管理咨询;批发和...
1深圳市兴荣欣科技有限责任公司存续汪超50万元人民币2014-09-012020-08-04广东省深圳市龙岗区13802572770...44030611121776731197532-3-有限责任公司批发业--深圳市龙岗区南湾街道南岭村社区南园路4号文峰华庭1栋B座210深圳市光明新区公明街道长圳村长圳路西八巷10号一般经营项目是:LED灯饰的销售;国内贸易;货物及技术进出口;软件的开发与销售;游戏软件的开...
2北京顺信益信息技术有限公司存续王维虎10000万元人民币2015-07-132018-06-01北京市北京市海淀区010-56855134...11010801948050534438049-X12有限责任公司(自然人投资或控股)软件和信息技术服务业--北京市海淀区万寿路甲12号北京万寿宾馆B座北侧三层1325北京市海淀区马甸东路19号15层1815技术开发、技术服务、技术咨询、技术推广;销售机械设备、电子产品、工艺品;经济贸易咨询;企业策...

3 rows × 25 columns

总结:header = None 表示第一行不作为列名;usecols 表示读取哪几列,默认是全部列;nrow表示读取的数据行数;index_col表示把某一列或者某几列作为索引;parse_date把这一列的数据转化为表示时间的列。

pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt')

col1 |||| col2
0TS |||| This is an apple.
1GQ |||| My name is Bob.
2WT |||| Well done!
3PT |||| May I help you?
pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt'
             ,sep = '\|\|\|\|',engine = 'python') #sep中使用的是正则表达式
col1col2
0TSThis is an apple.
1GQMy name is Bob.
2WTWell done!
3PTMay I help you?

2.1.2 数据写入

df_csv.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_csv_saved.csv',index = None)  #更改的是我自己的文件
df_excel.to_excel(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_excel_saved.xlsx',index = None)  
df_txt.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_txt_saved.txt',index = None)

2.2基本数据结构

2.2.1 Series

s = pd.Series(data = [100,'a',{'阿信':'最帅'}]   #my_index相当于是列名,name是Series的名字
              ,index = pd.Index(['id1',20,'third'],name = 'my_index')
              ,dtype = 'object'
              ,name = 'my_name'
              
)
s   
my_index
id1               100
20                  a
third    {'阿信': '最帅'}
Name: my_name, dtype: object
s.values
array([100, 'a', {'阿信': '最帅'}], dtype=object)
s.dtype
dtype('O')
s.index
Index(['id1', 20, 'third'], dtype='object', name='my_index')
s.name
'my_name'
s.shape
(3,)

2.2.2 数据框

data = [[1,'a',1.2],[2,'b',2.2],[3,'c',3.2]]
df = pd.DataFrame(data = data
                 ,index = ['row_%d'%i for i in range(3)]  #由行索引来构造数据
                 ,columns=['col_0','col_1','col_2'])
df
col_0col_1col_2
row_01a1.2
row_12b2.2
row_23c3.2
df = pd.DataFrame(data = {'col_0':[1,2,3],'col_1':list('abc'),'col_2':[1.2,2.2,3.2]}
                 ,index = ['row_%d'%i for i in range(3)])
df
col_0col_1col_2
row_01a1.2
row_12b2.2
row_23c3.2
df['col_0']
row_0    1
row_1    2
row_2    3
Name: col_0, dtype: int64
df[['col_0','col_1']]df.

col_0col_1
row_01a
row_12b
row_23c
df.values
array([[1, 'a', 1.2],
       [2, 'b', 2.2],
       [3, 'c', 3.2]], dtype=object)
df.index
Index(['row_0', 'row_1', 'row_2'], dtype='object')
df.columns
Index(['col_0', 'col_1', 'col_2'], dtype='object')
df.dtypes
col_0      int64
col_1     object
col_2    float64
dtype: object
df.shape
(3, 3)
df.T
row_0row_1row_2
col_0123
col_1abc
col_21.22.23.2

2.3 常用基本函数

import pandas as pd
df = pd.read_csv('joyful-pandas-master\data\learn_pandas.csv')
df
SchoolGradeNameGenderHeightWeightTransferTest_NumberTest_DateTime_Record
0Shanghai Jiao Tong UniversityFreshmanGaopeng YangFemale158.946.0N12019/10/50:04:34
1Peking UniversityFreshmanChangqiang YouMale166.570.0N12019/9/40:04:20
2Shanghai Jiao Tong UniversitySeniorMei SunMale188.989.0N22019/9/120:05:22
3Fudan UniversitySophomoreXiaojuan SunFemaleNaN41.0N22020/1/30:04:08
4Fudan UniversitySophomoreGaojuan YouMale174.074.0N22019/11/60:05:22
.................................
195Fudan UniversityJuniorXiaojuan SunFemale153.946.0N22019/10/170:04:31
196Tsinghua UniversitySeniorLi ZhaoFemale160.950.0N32019/9/220:04:03
197Shanghai Jiao Tong UniversitySeniorChengqiang ChuFemale153.945.0N12020/1/50:04:48
198Shanghai Jiao Tong UniversitySeniorChengmei ShenMale175.371.0N22020/1/70:04:58
199Tsinghua UniversitySophomoreChunpeng LvMale155.751.0N12019/11/60:05:05

200 rows × 10 columns

df.columns
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer',
       'Test_Number', 'Test_Date', 'Time_Record'],
      dtype='object')
#df = df[df.columns[:7]]
df.columns[:7]  #取出前7列的列名
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer'], dtype='object')
df[df.columns[:7]] #t通过这个列名,取出前7列对应的值,然后赋值给df
SchoolGradeNameGenderHeightWeightTransfer
0Shanghai Jiao Tong UniversityFreshmanGaopeng YangFemale158.946.0N
1Peking UniversityFreshmanChangqiang YouMale166.570.0N
2Shanghai Jiao Tong UniversitySeniorMei SunMale188.989.0N
3Fudan UniversitySophomoreXiaojuan SunFemaleNaN41.0N
4Fudan UniversitySophomoreGaojuan YouMale174.074.0N
........................
195Fudan UniversityJuniorXiaojuan SunFemale153.946.0N
196Tsinghua UniversitySeniorLi ZhaoFemale160.950.0N
197Shanghai Jiao Tong UniversitySeniorChengqiang ChuFemale153.945.0N
198Shanghai Jiao Tong UniversitySeniorChengmei ShenMale175.371.0N
199Tsinghua UniversitySophomoreChunpeng LvMale155.751.0N

200 rows × 7 columns

2.3.1 汇总函数

df.head(2)
SchoolGradeNameGenderHeightWeightTransfer
0Shanghai Jiao Tong UniversityFreshmanGaopeng YangFemale158.946.0N
1Peking UniversityFreshmanChangqiang YouMale166.570.0N
df.tail(3)
SchoolGradeNameGenderHeightWeightTransfer
197Shanghai Jiao Tong UniversitySeniorChengqiang ChuFemale153.945.0N
198Shanghai Jiao Tong UniversitySeniorChengmei ShenMale175.371.0N
199Tsinghua UniversitySophomoreChunpeng LvMale155.751.0N
df.head()
SchoolGradeNameGenderHeightWeightTransfer
0Shanghai Jiao Tong UniversityFreshmanGaopeng YangFemale158.946.0N
1Peking UniversityFreshmanChangqiang YouMale166.570.0N
2Shanghai Jiao Tong UniversitySeniorMei SunMale188.989.0N
3Fudan UniversitySophomoreXiaojuan SunFemaleNaN41.0N
4Fudan UniversitySophomoreGaojuan YouMale174.074.0N
df.tail()
SchoolGradeNameGenderHeightWeightTransfer
195Fudan UniversityJuniorXiaojuan SunFemale153.946.0N
196Tsinghua UniversitySeniorLi ZhaoFemale160.950.0N
197Shanghai Jiao Tong UniversitySeniorChengqiang ChuFemale153.945.0N
198Shanghai Jiao Tong UniversitySeniorChengmei ShenMale175.371.0N
199Tsinghua UniversitySophomoreChunpeng LvMale155.751.0N

总结:head tail 分别返回数据的前多少行,和后多少行,括号里不指定的话,默认返回前5或者后5

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   School    200 non-null    object 
 1   Grade     200 non-null    object 
 2   Name      200 non-null    object 
 3   Gender    200 non-null    object 
 4   Height    183 non-null    float64
 5   Weight    189 non-null    float64
 6   Transfer  188 non-null    object 
dtypes: float64(2), object(5)
memory usage: 11.1+ KB
df.describe()
HeightWeight
count183.000000189.000000
mean163.21803355.015873
std8.60887912.824294
min145.40000034.000000
25%157.15000046.000000
50%161.90000051.000000
75%167.50000065.000000
max193.90000089.000000

总结:info和describe分别返回表的基本信息和数据的主要统计量

2.3.2 特征统计函数

df_demo = df[['Height','Weight']]
df_demo.mean() #平均数
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.max() #最大值
Height    193.9
Weight     89.0
dtype: float64
df_demo.quantile() #分位数 默认是中位数
Height    161.9
Weight     51.0
Name: 0.5, dtype: float64
df_demo.quantile(0.75) #上四分位数
Height    167.5
Weight     65.0
Name: 0.75, dtype: float64
df_demo.count() #非缺失值的数量
Height    183
Weight    189
dtype: int64
df_demo.idxmax()  #最大值对应的索引
Height    193
Weight      2
dtype: int64
#聚合函数 axis = 0默认为0 表示列聚合 axis=1表示行聚合
df_demo.mean(axis = 1).head()  #求的是某个学生的身高和体重的平均数,没有意义
0    102.45
1    118.25
2    138.95
3     41.00
4    124.00
dtype: float64

2.3.3 唯一值函数

df['School'].unique()
array(['Shanghai Jiao Tong University', 'Peking University',
       'Fudan University', 'Tsinghua University'], dtype=object)
df['School'].nunique()
4

总结:unique可以得到某一列中的唯一值(出现的哪几类的数据),nunique得到种类的数量

df['School'].value_counts()  #value_counts返回唯一值和其对应的频数
Tsinghua University              69
Shanghai Jiao Tong University    57
Fudan University                 40
Peking University                34
Name: School, dtype: int64
df_demo = df[['Gender','Transfer','Name']]
df_demo.drop_duplicates(['Gender','Transfer'],keep = 'last')  #去重函数 多个列唯一值的组合,以Gender和Transfer为关键字,在df_demo 中执行,所以会有name
GenderTransferName
147MaleNaNJuan You
150MaleYChengpeng You
169FemaleYChengquan Qin
194FemaleNaNYanmei Qian
197FemaleNChengqiang Chu
199MaleNChunpeng Lv
df_demo.drop_duplicates(['Name','Gender'],keep = False).head(100)  #其余重复项都删除
GenderTransferName
0FemaleNGaopeng Yang
1MaleNChangqiang You
2MaleNMei Sun
4MaleNGaojuan You
5FemaleNXiaoli Qian
............
115FemaleNGaofeng Sun
116MaleNFeng Zhao
117MaleNChunli Zhao
119FemaleNPeng Zhang
120FemaleNaNPeng Han

100 rows × 3 columns

df['School'].drop_duplicates() #Series中也可以使用
0    Shanghai Jiao Tong University
1                Peking University
3                 Fudan University
5              Tsinghua University
Name: School, dtype: object
df_demo.duplicated(['Gender','Transfer']).head(100) #重复为True 不重复为False drop_duplicates 是把重复的行以对应的指令删除,并把剩余的显示出来
0     False
1     False
2      True
3      True
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Length: 100, dtype: bool
df['School'].duplicated().head(100)  #Seried中也可以使用
0     False
1     False
2      True
3     False
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Name: School, Length: 100, dtype: bool

2.3.4 替换函数

总结:映射替换(replace),逻辑替换(where,mask),数值替换(abs,clip,round)

df['Gender'].replace({'Female':0,'Male':1}).head(100)
0     0
1     1
2     1
3     0
4     1
     ..
95    1
96    0
97    0
98    1
99    1
Name: Gender, Length: 100, dtype: int64
df['Gender'].replace(['Female','Male'],[0,1]).head(100)
0     0
1     1
2     1
3     0
4     1
     ..
95    1
96    0
97    0
98    1
99    1
Name: Gender, Length: 100, dtype: int64
s = pd.Series(['a',1,'b',2,1,1,'a'])
s
0    a
1    1
2    b
3    2
4    1
5    1
6    a
dtype: object
s.replace([1,2],method='ffill') #替换1和2,用前面一个最近的未被替换的值替换1和2,一个是a一个是b
0    a
1    a
2    b
3    b
4    b
5    b
6    a
dtype: object
s.replace([1,2],method='bfill') #1-b 2-a
0    a
1    b
2    b
3    a
4    a
5    a
6    a
dtype: object
#逻辑替换
s = pd.Series([-1,2,100,-50])
s.where(s<0) #False时进行替换
0    -1.0
1     NaN
2     NaN
3   -50.0
dtype: float64
s.where(s<0,100)
0     -1
1    100
2    100
3    -50
dtype: int64
s.mask(s<0) #True时进行替换
0      NaN
1      2.0
2    100.0
3      NaN
dtype: float64
s.mask(s<0,10)
0     10
1      2
2    100
3     10
dtype: int64
s_condition = pd.Series([True,False,False,True],index = s.index)
s.mask(s_condition,-50)  #True时替换为-50,False时不变,保留原来的数值
0    -50
1      2
2    100
3    -50
dtype: int64
#数值替换
s = pd.Series([-1,3.5515,100,-50])
s.round() #括号里是小数位数,默认是取整,四舍五入
0     -1.0
1      4.0
2    100.0
3    -50.0
dtype: float64
s.abs()
0      1.0000
1      3.5515
2    100.0000
3     50.0000
dtype: float64
s
0     -1.0000
1      3.5515
2    100.0000
3    -50.0000
dtype: float64
s.clip(0,5) #小于0的用0代替,大于5的用5代替
0    0.0000
1    3.5515
2    5.0000
3    0.0000
dtype: float64

参数:lower : float或array_like,默认为None

最小阈值。低于此阈值的所有值都将设置为它。

upper : float或array_like,默认为None

最大阈值。高于此阈值的所有值都将设置为它。

axis : int或string轴名称,可选

沿给定轴将对象与下部和上部对齐。

inplace : 布尔值,默认为False

是否对数据执行操作。

返回:

Series或DataFrame

与调用对象相同的类型,替换了剪辑边界之外的值

参考:https://www.cjavapy.com/article/330/

import pandas as pd
data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}  
df = pd.DataFrame(data)
df
col_0col_1
09-2
1-3-7
206
3-18
45-5

t = pd.Series([2, -4, -1, 6, 3])
t
0    2
1   -4
2   -1
3    6
4    3
dtype: int64
df
col_0col_1
09-2
1-3-7
206
3-18
45-5
 df.clip(t, t + 4, axis=0)  #从网上找的这个例子不明白啊
col_0col_1
062
1-3-4
203
368
453

2.3.5 排序函数

df = pd.read_csv('joyful-pandas-master/data/learn_pandas.csv')
df_demo = df[['Grade','Name','Height','Weight']].set_index(['Grade','Name'])
df_demo.sort_values('Height').head()
HeightWeight
GradeName
JuniorXiaoli Chu145.434.0
SeniorGaomei Lv147.334.0
SophomorePeng Han147.834.0
SeniorChangli Lv148.741.0
SophomoreChangjuan You150.540.0
df_demo.sort_values('Height',ascending = False).head()
HeightWeight
GradeName
SeniorXiaoqiang Qin193.979.0
Mei Sun188.989.0
Gaoli Zhao186.583.0
FreshmanQiang Han185.387.0
SeniorQiang Zheng183.987.0
df_demo.sort_values(['Weight','Height'],ascending=[True,False]).head(100)  #Weight是升序排列,在相同的weight下,Height再降序排列

HeightWeight
GradeName
SophomorePeng Han147.834.0
SeniorGaomei Lv147.334.0
JuniorXiaoli Chu145.434.0
SophomoreQiang Zhou150.536.0
FreshmanYanqiang Xu152.438.0
Qiang Han151.838.0
SeniorChengpeng Zheng151.738.0
SophomoreMei Xu154.239.0
FreshmanXiaoquan Sun154.640.0
SophomoreQiang Sun154.340.0
SeniorJuan You154.040.0
SophomoreChangjuan You150.540.0
SeniorYanli Zhang154.241.0
Changli Lv148.741.0
SophomoreXiaojuan SunNaN41.0
FreshmanGaojuan QinNaN41.0
Gaoquan Sun156.842.0
SeniorXiaopeng Chu156.542.0
JuniorQiang Lv152.142.0
SophomoreXiaoqiang Feng157.043.0
JuniorGaoqiang Zhou156.843.0
FreshmanXiaoli Xu156.543.0
SophomoreFeng Qian156.443.0
FreshmanQuan Chu154.743.0
JuniorXiaoquan Lv153.243.0
FreshmanQiang Zhang152.743.0
Gaofeng Zhao152.243.0
SophomoreChangmei Xu151.643.0
FreshmanChunmei Wang151.243.0
Feng Yang158.944.0
SeniorQuan Xu157.044.0
JuniorMei Zhang156.544.0
Chengpeng Zhao156.044.0
FreshmanLi Lv155.244.0
JuniorGaojuan Qian154.844.0
Chunmei Han153.244.0
SophomoreChunpeng Shi152.944.0
SeniorGaojuan Zhao151.544.0
JuniorYanpeng HanNaN44.0
FreshmanChangquan Chu159.645.0
JuniorXiaofeng You158.545.0
SophomoreXiaoquan Zhang158.345.0
SeniorChengqiang Chu153.945.0
FreshmanXiaoli Lv152.545.0
JuniorXiaofeng Zhao159.946.0
FreshmanGaopeng Yang158.946.0
Gaoli Feng157.446.0
SeniorChangmei Sun155.346.0
Xiaopeng Qian154.346.0
JuniorXiaojuan Sun153.946.0
Changjuan You161.447.0
FreshmanChengquan Chu161.347.0
SeniorJuan Zhao161.247.0
JuniorYanli Zhang160.647.0
SeniorJuan Zhang159.947.0
Chunjuan Xu159.847.0
JuniorChunjuan Zhang158.947.0
SeniorXiaopeng Lv158.447.0
SophomoreXiaomei Shi157.947.0
SeniorJuan Qin156.047.0
Gaoli Wu155.747.0
Feng Zhou155.647.0
FreshmanChangli Zhang163.048.0
Gaopeng Shi162.948.0
JuniorGaofeng Sun162.848.0
SophomoreYanfeng Qian160.148.0
JuniorQiang Wang157.548.0
Gaoli Xu157.348.0
SeniorPeng YouNaN48.0
JuniorYanli YouNaN48.0
FreshmanYanjuan Han163.749.0
SeniorFeng Zheng162.649.0
JuniorXiaojuan Zhao160.349.0
SeniorYanmei Qian160.349.0
JuniorChangjuan Xu159.649.0
Yanjuan Lv159.349.0
FreshmanXiaomei Yang159.349.0
Xiaofeng Qian158.549.0
Changqiang Yang156.049.0
SeniorQiang Chu162.450.0
JuniorGaoqiang Qian161.950.0
SeniorMei Zheng161.150.0
Li Zhao160.950.0
Xiaojuan Qian160.650.0
JuniorMei Sun159.550.0
SeniorQuan Qian159.050.0
JuniorFeng Zheng165.651.0
Li Chu165.251.0
Xiaojuan Qian164.751.0
FreshmanLi Wu164.351.0
Yanqiang Feng162.351.0
JuniorChengquan Shi160.851.0
Xiaopeng Zhou160.251.0
Feng Zhao159.051.0
FreshmanXiaoli Qian158.051.0
JuniorGaoquan Shen158.051.0
SophomoreChunpeng Lv155.751.0
SeniorMei FengNaN51.0
JuniorGaoquan ChuNaN51.0
SeniorFeng Yang167.052.0
df_demo.sort_values(['Height','Weight'],ascending=[True,False]).head(100)  #固定相同身高,体重是降序排列
HeightWeight
GradeName
JuniorXiaoli Chu145.434.0
SeniorGaomei Lv147.334.0
SophomorePeng Han147.834.0
SeniorChangli Lv148.741.0
SophomoreChangjuan You150.540.0
Qiang Zhou150.536.0
FreshmanChunmei Wang151.243.0
SeniorGaojuan Zhao151.544.0
SophomoreChangmei Xu151.643.0
SeniorChengpeng Zheng151.738.0
FreshmanQiang Han151.838.0
JuniorQiang Lv152.142.0
FreshmanGaofeng Zhao152.243.0
Yanqiang Xu152.438.0
Xiaoli Lv152.545.0
Qiang Zhang152.743.0
SophomoreChunpeng Shi152.944.0
JuniorChunmei Han153.244.0
Xiaoquan Lv153.243.0
SeniorMei Chen153.6NaN
JuniorXiaojuan Sun153.946.0
SeniorChengqiang Chu153.945.0
Juan You154.040.0
Yanli Zhang154.241.0
SophomoreMei Xu154.239.0
SeniorXiaopeng Qian154.346.0
SophomoreQiang Sun154.340.0
FreshmanXiaoquan Sun154.640.0
Quan Chu154.743.0
JuniorGaojuan Qian154.844.0
FreshmanLi Lv155.244.0
SeniorChangmei Sun155.346.0
Feng Zhou155.647.0
SophomoreChunpeng Lv155.751.0
SeniorGaoli Wu155.747.0
FreshmanChangqiang Yang156.049.0
SeniorJuan Qin156.047.0
JuniorChengpeng Zhao156.044.0
SophomoreFeng Qian156.443.0
JuniorMei Zhang156.544.0
FreshmanXiaoli Xu156.543.0
SeniorXiaopeng Chu156.542.0
JuniorGaoqiang Zhou156.843.0
FreshmanGaoquan Sun156.842.0
SeniorQuan Xu157.044.0
SophomoreXiaoqiang Feng157.043.0
JuniorGaoli Xu157.348.0
FreshmanGaoli Feng157.446.0
JuniorQiang Wang157.548.0
SeniorQiang Shi157.7NaN
SophomoreXiaomei Shi157.947.0
FreshmanXiaoli Qian158.051.0
JuniorGaoquan Shen158.051.0
SophomoreXiaoquan Zhang158.345.0
SeniorXiaopeng Lv158.447.0
FreshmanXiaofeng Qian158.549.0
JuniorXiaofeng You158.545.0
Chunjuan Zhang158.947.0
FreshmanGaopeng Yang158.946.0
Feng Yang158.944.0
JuniorFeng Zhao159.051.0
SeniorQuan Qian159.050.0
JuniorYanjuan Lv159.349.0
FreshmanXiaomei Yang159.349.0
SeniorGaopeng Qin159.452.0
JuniorMei Sun159.550.0
Changjuan Xu159.649.0
FreshmanChangquan Chu159.645.0
SeniorChunjuan Xu159.847.0
Juan Zhang159.947.0
JuniorXiaofeng Zhao159.946.0
Xiaopeng Shen160.153.0
SophomoreYanfeng Qian160.148.0
JuniorXiaopeng Zhou160.251.0
Xiaojuan Zhao160.349.0
SeniorYanmei Qian160.349.0
JuniorQuan Zhao160.653.0
SeniorXiaojuan Qian160.650.0
JuniorYanli Zhang160.647.0
Chengquan Qin160.752.0
SophomoreXiaoqiang Qin160.854.0
JuniorChengquan Shi160.851.0
Qiang Sun160.8NaN
SeniorLi Zhao160.950.0
FreshmanXiaopeng Zhao161.053.0
SeniorMei Zheng161.150.0
Juan Zhao161.247.0
FreshmanChengquan Chu161.347.0
JuniorChangjuan You161.447.0
SeniorLi Xu161.553.0
Chunpeng Qian161.6NaN
JuniorXiaopeng Sun161.954.0
Gaoqiang Qian161.950.0
Chunquan Xu162.154.0
FreshmanYanqiang Feng162.351.0
Xiaojuan Chu162.458.0
SeniorQiang Chu162.450.0
FreshmanPeng Wu162.553.0
Qiang Chu162.552.0
SeniorFeng Zheng162.649.0
df_demo.sort_index(level=['Grade','Name'],ascending=[True,False]).head(100) #按照Grade,Name进行排序
HeightWeight
GradeName
FreshmanYanquan Wang163.555.0
Yanqiang Xu152.438.0
Yanqiang Feng162.351.0
Yanpeng LvNaN65.0
Yanli Zhang165.152.0
Yanjuan ZhaoNaN53.0
Yanjuan Han163.749.0
Xiaoquan Sun154.640.0
Xiaopeng Zhou174.174.0
Xiaopeng Zhao161.053.0
Xiaopeng Han164.153.0
Xiaomei Yang159.349.0
Xiaoli Xu156.543.0
Xiaoli Qian158.051.0
Xiaoli Lv152.545.0
Xiaojuan QinNaN79.0
Xiaojuan Chu162.458.0
Xiaofeng Qian158.549.0
Quan Chu154.743.0
Qiang Zhang152.743.0
Qiang Shi164.552.0
Qiang Han185.387.0
Qiang Han151.838.0
Qiang Feng178.980.0
Qiang Chu162.552.0
Peng Zhang163.1NaN
Peng Wu162.553.0
Li Wu164.351.0
Li Lv155.244.0
Juan Zhang168.655.0
Gaoquan XuNaN52.0
Gaoquan Sun156.842.0
Gaoqiang Qin170.263.0
Gaopeng Yang158.946.0
Gaopeng Shi162.948.0
Gaoli Zhao175.478.0
Gaoli Feng157.446.0
Gaojuan QinNaN41.0
Gaofeng Zhao152.243.0
Feng Yang158.944.0
Feng Wang176.374.0
Chunmei Wang151.243.0
Chunmei Shi164.952.0
Chunli Zhao180.283.0
Chengquan Chu161.347.0
Changquan Chu159.645.0
Changqiang You166.570.0
Changqiang Yang156.049.0
Changpeng Zhao181.383.0
Changmei Lv172.275.0
Changmei Feng163.856.0
Changli Zhang163.048.0
JuniorYanpeng HanNaN44.0
Yanmei Yang167.757.0
Yanli Zhang160.647.0
Yanli YouNaN48.0
Yanli Wang169.967.0
Yanjuan Lv159.349.0
Yanfeng Qian178.775.0
Xiaoquan Lv153.243.0
Xiaoqiang Qin170.168.0
Xiaopeng Zhou160.251.0
Xiaopeng Sun161.954.0
Xiaopeng Shen160.153.0
Xiaoli Wang171.470.0
Xiaoli Chu145.434.0
Xiaojuan Zhao160.349.0
Xiaojuan Sun153.946.0
Xiaojuan Qian164.751.0
Xiaofeng Zhao159.946.0
Xiaofeng You158.545.0
Quan Zhao160.653.0
Qiang You170.056.0
Qiang Wang157.548.0
Qiang Sun163.153.0
Qiang Sun160.8NaN
Qiang Lv152.142.0
Peng Wang162.865.0
Mei Zhang156.544.0
Mei Sun159.550.0
Li Sun166.654.0
Li Chu165.251.0
Juan Xu164.8NaN
Gaoquan Zhou166.870.0
Gaoquan Shen158.051.0
Gaoquan ChuNaN51.0
Gaoqiang Zhou156.843.0
Gaoqiang Qin167.171.0
Gaoqiang Qian161.950.0
Gaoli Xu157.348.0
Gaojuan Qian154.844.0
Gaofeng Sun162.848.0
Feng Zheng165.651.0
Feng Zhao159.051.0
Chunquan Xu162.154.0
Chunqiang Chu168.672.0
Chunmei Han153.244.0
Chunjuan Zhang158.947.0
Chunfeng Zhao173.472.0
Chengquan Shi160.851.0

2.3.6 apply方法

df_demo = df[['Height','Weight']]
def my_mean(x):
    res = x.mean()
    return res
df_demo.apply(my_mean)
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.apply(lambda x:x.mean())
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.apply(lambda x:x.mean(),axis = 1).head()
0    102.45
1    118.25
2    138.95
3     41.00
4    124.00
dtype: float64
df_demo.apply(lambda x:(x-x.mean()).abs().mean()) #与上面的是两个函数
Height     6.707229
Weight    10.391870
dtype: float64
df_demo.mad()
Height     6.707229
Weight    10.391870
dtype: float64

2.4 窗口对象

https://www.gairuo.com/p/pandas-window-functions

可以把“窗口”(windows)这个理解一个集合,一个窗口就是一个集合,在统计分析中有需要不同的「窗口」,比如一个部门分成不同组,在统计时会按组进行平均、排名等操作。再比如,在一些像时间这种有顺序的数据,我们可能5天分一组、一月分一组再进行排序、求中位数等计算。
rolling(10) 与 groupby 很像,但并没有进行分组,而是创建了一个按移动 10(天)位的滑动窗口对象。我们再对每个对象进行统计操作。

2.4.1 滑窗对象

rolling得到滑窗对象,最重要的参数是窗口大小window

s = pd.Series([1,2,3,4,5])
roller = s.rolling(window = 3)
roller
Rolling [window=3,center=False,axis=0]
roller.mean() #https://blog.csdn.net/qsx123432/article/details/111396542  解释的很清楚
0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
dtype: float64
roller.sum()
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
dtype: float64
s2 = pd.Series([1,2,6,16,30])
roller.cov(s2)
0     NaN
1     NaN
2     2.5
3     7.0
4    12.0
dtype: float64
roller.corr(s2)
0         NaN
1         NaN
2    0.944911
3    0.970725
4    0.995402
dtype: float64
#通过apply传入自定义的函数
roller.apply(lambda x:x.mean())

0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
dtype: float64
a = pd.Series([1,3,6,10,15])
a.shift(2)  #取向前第2个元素的值
0    NaN
1    NaN
2    1.0
3    3.0
4    6.0
dtype: float64
a.diff(3) #与向前第3个元素做差
0     NaN
1     NaN
2     NaN
3     9.0
4    12.0
dtype: float64
a.pct_change() #与向前第ng个元素相比计算增长率
0         NaN
1    2.000000
2    1.000000
3    0.666667
4    0.500000
dtype: float64
a.shift(-1) #取向后一个元素的值
0     3.0
1     6.0
2    10.0
3    15.0
4     NaN
dtype: float64
a.diff(-2) #与向后第二个元素做差
0   -5.0
1   -7.0
2   -9.0
3    NaN
4    NaN
dtype: float64
a

0     1
1     3
2     6
3    10
4    15
dtype: int64

2.4.2 扩张窗口

s = pd.Series([1,3,6,10])
s.expanding().mean()
0    1.000000
1    2.000000
2    3.333333
3    5.000000
dtype: float64


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值