理论:
describe():快速查看每列数据的统计信息,以下是可以输出的统计指标
count,数据个数(非空数据)
mean,均值
std,标准差
min,最小值
25%,第1四分位数,即第25百分位数
50%,第2四分位数,即第50百分位数
75%,第3四分位数,即第75百分位数
max,最大值
quantile(q):
输出指定位置的百分位数,默认q=0.5,q的范围是[0,1]
常用统计方法:
sum(),求和
mean(),求均值
median(),求中位数
count(),求非空的个数
注意:以上统计方法不对缺失数据进行统计
max(),求最大值
min(),求最小值
idxmax(),返回最大值对应的索引
idxmin(),返回最小值对应的索引
注意:argmax()和argmin()在近期的版本中即将停止使用
mad(),求平均绝对误差(mean absolute deviation),对表示各个变量值之间差异程度的数值之一
var():方差
std():求标准差
cumsum(),求累加
第15节 常用统计方法(1) --describe、quantile
In [1]:
import pandas as pd
In [2]:
data = pd.read_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_happiness.csv')
data.head()
Out[2]:
Country | Region | Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Denmark | Western Europe | 1 | 7.526 | 7.460 | 7.592 | 1.44178 | 1.16374 | 0.79504 | 0.57941 | 0.44453 | 0.36171 | 2.73939 |
1 | Switzerland | Western Europe | 2 | 7.509 | 7.428 | 7.590 | 1.52733 | 1.14524 | 0.86303 | 0.58557 | 0.41203 | 0.28083 | 2.69463 |
2 | Iceland | Western Europe | 3 | 7.501 | 7.333 | 7.669 | 1.42666 | 1.18326 | 0.86733 | 0.56624 | 0.14975 | 0.47678 | 2.83137 |
3 | Norway | Western Europe | 4 | 7.498 | 7.421 | 7.575 | 1.57744 | 1.12690 | 0.79579 | 0.59609 | 0.35776 | 0.37895 | 2.66465 |
4 | Finland | Western Europe | 5 | 7.413 | 7.351 | 7.475 | 1.40598 | 1.13464 | 0.81091 | 0.57104 | 0.41004 | 0.25492 | 2.82596 |
In [3]:
data.describe()
Out[3]:
Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 | 157.000000 |
mean | 78.980892 | 5.382185 | 5.282395 | 5.481975 | 0.953880 | 0.793621 | 0.557619 | 0.370994 | 0.137624 | 0.242635 | 2.325807 |
std | 45.466030 | 1.141674 | 1.148043 | 1.136493 | 0.412595 | 0.266706 | 0.229349 | 0.145507 | 0.111038 | 0.133756 | 0.542220 |
min | 1.000000 | 2.905000 | 2.732000 | 3.078000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.817890 |
25% | 40.000000 | 4.404000 | 4.327000 | 4.465000 | 0.670240 | 0.641840 | 0.382910 | 0.257480 | 0.061260 | 0.154570 | 2.031710 |
50% | 79.000000 | 5.314000 | 5.237000 | 5.419000 | 1.027800 | 0.841420 | 0.596590 | 0.397470 | 0.105470 | 0.222450 | 2.290740 |
75% | 118.000000 | 6.269000 | 6.154000 | 6.434000 | 1.279640 | 1.021520 | 0.729930 | 0.484530 | 0.175540 | 0.311850 | 2.664650 |
max | 157.000000 | 7.526000 | 7.460000 | 7.669000 | 1.824270 | 1.183260 | 0.952770 | 0.608480 | 0.505210 | 0.819710 | 3.837720 |
In [4]:
data.quantile(q=0.5)
Out[4]:
Happiness Rank 79.00000 Happiness Score 5.31400 Lower Confidence Interval 5.23700 Upper Confidence Interval 5.41900 Economy (GDP per Capita) 1.02780 Family 0.84142 Health (Life Expectancy) 0.59659 Freedom 0.39747 Trust (Government Corruption) 0.10547 Generosity 0.22245 Dystopia Residual 2.29074 Name: 0.5, dtype: float64
In [5]:
data.quantile(q=0.25)
Out[5]:
Happiness Rank 40.00000 Happiness Score 4.40400 Lower Confidence Interval 4.32700 Upper Confidence Interval 4.46500 Economy (GDP per Capita) 0.67024 Family 0.64184 Health (Life Expectancy) 0.38291 Freedom 0.25748 Trust (Government Corruption) 0.06126 Generosity 0.15457 Dystopia Residual 2.03171 Name: 0.25, dtype: float64
In [6]:
import pandas as pd
In [7]:
data = pd.read_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\log.csv')
data.head()
Out[7]:
time | user | video | playback position | paused | volume | |
---|---|---|---|---|---|---|
0 | 1469974424 | cheryl | intro.html | 5 | False | 10.0 |
1 | 1469974454 | cheryl | intro.html | 6 | NaN | NaN |
2 | 1469974544 | cheryl | intro.html | 9 | NaN | NaN |
3 | 1469974574 | cheryl | intro.html | 10 | NaN | NaN |
4 | 1469977514 | bob | intro.html | 1 | NaN | NaN |
In [8]:
data.sum() #求和
Out[8]:
time 48509194942 user cherylcherylcherylcherylbobbobbobbobcherylcher... video intro.htmlintro.htmlintro.htmlintro.htmlintro.... playback position 429 paused 1 volume 35 dtype: object
In [9]:
data.mean() # 求均值
Out[9]:
time 1.469976e+09 playback position 1.300000e+01 paused 3.333333e-01 volume 8.750000e+00 dtype: float64
In [10]:
data.median() # 求中位数
Out[10]:
time 1.469975e+09 playback position 1.000000e+01 paused 0.000000e+00 volume 1.000000e+01 dtype: float64
In [11]:
data.count() #求非空的个数
Out[11]:
time 33 user 33 video 33 playback position 33 paused 3 volume 4 dtype: int64
In [12]:
import pandas as pd
In [13]:
data = pd.read_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_happiness.csv')
data.head()
Out[13]:
Country | Region | Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Denmark | Western Europe | 1 | 7.526 | 7.460 | 7.592 | 1.44178 | 1.16374 | 0.79504 | 0.57941 | 0.44453 | 0.36171 | 2.73939 |
1 | Switzerland | Western Europe | 2 | 7.509 | 7.428 | 7.590 | 1.52733 | 1.14524 | 0.86303 | 0.58557 | 0.41203 | 0.28083 | 2.69463 |
2 | Iceland | Western Europe | 3 | 7.501 | 7.333 | 7.669 | 1.42666 | 1.18326 | 0.86733 | 0.56624 | 0.14975 | 0.47678 | 2.83137 |
3 | Norway | Western Europe | 4 | 7.498 | 7.421 | 7.575 | 1.57744 | 1.12690 | 0.79579 | 0.59609 | 0.35776 | 0.37895 | 2.66465 |
4 | Finland | Western Europe | 5 | 7.413 | 7.351 | 7.475 | 1.40598 | 1.13464 | 0.81091 | 0.57104 | 0.41004 | 0.25492 | 2.82596 |
In [14]:
data.max()
Out[14]:
Country Zimbabwe Region Western Europe Happiness Rank 157 Happiness Score 7.526 Lower Confidence Interval 7.46 Upper Confidence Interval 7.669 Economy (GDP per Capita) 1.82427 Family 1.18326 Health (Life Expectancy) 0.95277 Freedom 0.60848 Trust (Government Corruption) 0.50521 Generosity 0.81971 Dystopia Residual 3.83772 dtype: object
In [15]:
data.min()
Out[15]:
Country Afghanistan Region Australia and New Zealand Happiness Rank 1 Happiness Score 2.905 Lower Confidence Interval 2.732 Upper Confidence Interval 3.078 Economy (GDP per Capita) 0 Family 0 Health (Life Expectancy) 0 Freedom 0 Trust (Government Corruption) 0 Generosity 0 Dystopia Residual 0.81789 dtype: object
In [17]:
data['Happiness Score'].idxmax()
Out[17]:
0
In [18]:
data['Happiness Score'].idxmin()
Out[18]:
156
In [21]:
data.mad() # 求绝对值误差
Out[21]:
Happiness Rank 39.254899 Happiness Score 0.955256 Lower Confidence Interval 0.957480 Upper Confidence Interval 0.953032 Economy (GDP per Capita) 0.342828 Family 0.211727 Health (Life Expectancy) 0.188426 Freedom 0.119887 Trust (Government Corruption) 0.084441 Generosity 0.102143 Dystopia Residual 0.413041 dtype: float64
In [22]:
data.var() #求方差
Out[22]:
Happiness Rank 2067.159889 Happiness Score 1.303418 Lower Confidence Interval 1.318002 Upper Confidence Interval 1.291617 Economy (GDP per Capita) 0.170235 Family 0.071132 Health (Life Expectancy) 0.052601 Freedom 0.021172 Trust (Government Corruption) 0.012329 Generosity 0.017891 Dystopia Residual 0.294003 dtype: float64
In [23]:
data.std() #求标准差
Out[23]:
Happiness Rank 45.466030 Happiness Score 1.141674 Lower Confidence Interval 1.148043 Upper Confidence Interval 1.136493 Economy (GDP per Capita) 0.412595 Family 0.266706 Health (Life Expectancy) 0.229349 Freedom 0.145507 Trust (Government Corruption) 0.111038 Generosity 0.133756 Dystopia Residual 0.542220 dtype: float64
In [24]:
data.cumsum() #求累加
Out[24]:
Country | Region | Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Denmark | Western Europe | 1 | 7.526 | 7.460 | 7.592 | 1.44178 | 1.16374 | 0.79504 | 0.57941 | 0.44453 | 0.36171 | 2.73939 |
1 | DenmarkSwitzerland | Western EuropeWestern Europe | 3 | 15.035 | 14.888 | 15.182 | 2.96911 | 2.30898 | 1.65807 | 1.16498 | 0.85656 | 0.64254 | 5.43402 |
2 | DenmarkSwitzerlandIceland | Western EuropeWestern EuropeWestern Europe | 6 | 22.536 | 22.221 | 22.851 | 4.39577 | 3.49224 | 2.52540 | 1.73122 | 1.00631 | 1.11932 | 8.26539 |
3 | DenmarkSwitzerlandIcelandNorway | Western EuropeWestern EuropeWestern EuropeWest... | 10 | 30.034 | 29.642 | 30.426 | 5.97321 | 4.61914 | 3.32119 | 2.32731 | 1.36407 | 1.49827 | 10.93004 |
4 | DenmarkSwitzerlandIcelandNorwayFinland | Western EuropeWestern EuropeWestern EuropeWest... | 15 | 37.447 | 36.993 | 37.901 | 7.37919 | 5.75378 | 4.13210 | 2.89835 | 1.77411 | 1.75319 | 13.75600 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
152 | DenmarkSwitzerlandIcelandNorwayFinlandCanadaNe... | Western EuropeWestern EuropeWestern EuropeWest... | 11778 | 832.366 | 817.188 | 847.544 | 148.28013 | 124.10506 | 86.33722 | 57.62264 | 21.15342 | 36.91896 | 357.94872 |
153 | DenmarkSwitzerlandIcelandNorwayFinlandCanadaNe... | Western EuropeWestern EuropeWestern EuropeWest... | 11932 | 835.726 | 820.476 | 850.976 | 148.66240 | 124.21543 | 86.51066 | 57.78694 | 21.22454 | 37.23164 | 360.09430 |
154 | DenmarkSwitzerlandIcelandNorwayFinlandCanadaNe... | Western EuropeWestern EuropeWestern EuropeWest... | 12087 | 839.029 | 823.668 | 854.390 | 148.94363 | 124.21543 | 86.75877 | 58.13372 | 21.34041 | 37.40681 | 362.22970 |
155 | DenmarkSwitzerlandIcelandNorwayFinlandCanadaNe... | Western EuropeWestern EuropeWestern EuropeWest... | 12243 | 842.098 | 826.604 | 857.592 | 149.69082 | 124.36409 | 87.38871 | 58.20284 | 21.51274 | 37.89078 | 363.04759 |
156 | DenmarkSwitzerlandIcelandNorwayFinlandCanadaNe... | Western EuropeWestern EuropeWestern EuropeWest... | 12400 | 845.003 | 829.336 | 860.670 | 149.75913 | 124.59851 | 87.54618 | 58.24604 | 21.60693 | 38.09368 | 365.15163 |
157 rows × 13 columns