python描述性统计案例_Pandas 之 描述性统计案例

认识

import numpy as np

import pandas as pd

pandas objects are equipped(配备的) with a set of common mathematical and statistical methods. Most of these fall into the categrory of reductions or summary statistics, methods that exract(提取) a single value(like the sum or mean) from a Series of values from the rows or columns of a DataFrame. Compared with the similar methods found on NumPy arrays, they built-in handling for missiing data. Consider a small DataFarme -> (pandas提供了一些常用的统计函数, 输入通常是一个series的值, 或df的行, 列; 值得一提的是, pandas提供了缺失值处理, 在统计的时候, 不列入计算)

df = pd.DataFrame([

[1.4, np.nan],

[7.6, -4.5],

[np.nan, np.nan],

[3, -1.5]

],

index=list('abcd'), columns=['one', 'two'])

df

one

two

a

1.4

NaN

b

7.6

-4.5

c

NaN

NaN

d

3.0

-1.5

Calling DataFrame's sum method returns a Series containing column sums:

"默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值"

df.sum()

df.mean()

"在计算平均值时, NaN 不计入样本"

'默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值'

one 12.0

two -6.0

dtype: float64

one 4.0

two -3.0

dtype: float64

'在计算平均值时, NaN 不计入样本'

Passing axis='columns' or axis=1 sums across the columns instead. -> axis方向

"按行统计, aixs=1, 列方向, 右边"

df.sum(axis=1)

'按行统计, aixs=1, 列方向, 右边'

a 1.4

b 3.1

c 0.0

d 1.5

dtype: float64

NA values are excluded unless the entire slice (row or column in the case) is NA. This can be disabled with the skipna option: -> 统计计算会自动忽略缺失值, 不计入样本

"默认是忽略缺失值的, 要缺失值, 则手动指定一下"

df.mean(skipna=False, axis='columns') # 列方向, 行哦

'默认是忽略缺失值的, 要缺失值, 则手动指定一下'

a NaN

b 1.55

c NaN

d 0.75

dtype: float64

See Table 5-7 for a list of common options for each reduction method.

Method

Description

axis

Axis to reduce over, 0 for DataFrame's rows and 1 for columns

skipna

Exclude missing values; True by default

level

Reduce grouped by level if the axis is hierachically indexed(MaltiIndex)

Some methods, like idmax and idmin, return indirect statistics like the index where the minimum or maximum values are attained(取得).

"idxmax() 返回最大值的第一个索引标签"

df.idxmax()

'idxmax() 返回最大值的第一个索引标签'

one b

two d

dtype: object

Other methods are accumulations: 累积求和-默认axis=0 行方向

"累积求和, 默认axis=0, 忽略NA"

df.cumsum()

"也可指定axis=1列方向"

df.cumsum(axis=1)

'累积求和, 默认axis=0, 忽略NA'

one

two

a

1.4

NaN

b

9.0

-4.5

c

NaN

NaN

d

12.0

-6.0

&#

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值