认识
import numpy as np
import pandas as pd
pandas objects are equipped(配备的) with a set of common mathematical and statistical methods. Most of these fall into the categrory of reductions or summary statistics, methods that exract(提取) a single value(like the sum or mean) from a Series of values from the rows or columns of a DataFrame. Compared with the similar methods found on NumPy arrays, they built-in handling for missiing data. Consider a small DataFarme -> (pandas提供了一些常用的统计函数, 输入通常是一个series的值, 或df的行, 列; 值得一提的是, pandas提供了缺失值处理, 在统计的时候, 不列入计算)
df = pd.DataFrame([
[1.4, np.nan],
[7.6, -4.5],
[np.nan, np.nan],
[3, -1.5]
],
index=list('abcd'), columns=['one', 'two'])
df
one
two
a
1.4
NaN
b
7.6
-4.5
c
NaN
NaN
d
3.0
-1.5
Calling DataFrame's sum method returns a Series containing column sums:
"默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值"
df.sum()
df.mean()
"在计算平均值时, NaN 不计入样本"
'默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值'
one 12.0
two -6.0
dtype: float64
one 4.0
two -3.0
dtype: float64
'在计算平均值时, NaN 不计入样本'
Passing axis='columns' or axis=1 sums across the columns instead. -> axis方向
"按行统计, aixs=1, 列方向, 右边"
df.sum(axis=1)
'按行统计, aixs=1, 列方向, 右边'
a 1.4
b 3.1
c 0.0
d 1.5
dtype: float64
NA values are excluded unless the entire slice (row or column in the case) is NA. This can be disabled with the skipna option: -> 统计计算会自动忽略缺失值, 不计入样本
"默认是忽略缺失值的, 要缺失值, 则手动指定一下"
df.mean(skipna=False, axis='columns') # 列方向, 行哦
'默认是忽略缺失值的, 要缺失值, 则手动指定一下'
a NaN
b 1.55
c NaN
d 0.75
dtype: float64
See Table 5-7 for a list of common options for each reduction method.
Method
Description
axis
Axis to reduce over, 0 for DataFrame's rows and 1 for columns
skipna
Exclude missing values; True by default
level
Reduce grouped by level if the axis is hierachically indexed(MaltiIndex)
Some methods, like idmax and idmin, return indirect statistics like the index where the minimum or maximum values are attained(取得).
"idxmax() 返回最大值的第一个索引标签"
df.idxmax()
'idxmax() 返回最大值的第一个索引标签'
one b
two d
dtype: object
Other methods are accumulations: 累积求和-默认axis=0 行方向
"累积求和, 默认axis=0, 忽略NA"
df.cumsum()
"也可指定axis=1列方向"
df.cumsum(axis=1)
'累积求和, 默认axis=0, 忽略NA'
one
two
a
1.4
NaN
b
9.0
-4.5
c
NaN
NaN
d
12.0
-6.0
&#