python描述性统计案例_Pandas 之描述性统计案例

最新推荐文章于 2024-05-24 20:06:48 发布

weixin_39859394

最新推荐文章于 2024-05-24 20:06:48 发布

阅读量664

点赞数

文章标签： python描述性统计案例

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39859394/article/details/111433292

版权

本文介绍了Pandas库中用于描述性统计的方法，包括数据框的sum、mean、describe等函数，详细解释了如何处理缺失值，并通过案例展示了如何计算列的分位数、均值、标准差等统计指标，以及如何进行相关性分析和协方差计算。此外，还探讨了如何使用corrwith方法进行成对相关性计算，以及value_counts、unique等方法用于提取唯一值和统计频率。

摘要由CSDN通过智能技术生成

认识

import numpy as np

import pandas as pd

pandas objects are equipped(配备的) with a set of common mathematical and statistical methods. Most of these fall into the categrory of reductions or summary statistics, methods that exract(提取) a single value(like the sum or mean) from a Series of values from the rows or columns of a DataFrame. Compared with the similar methods found on NumPy arrays, they built-in handling for missiing data. Consider a small DataFarme -> (pandas提供了一些常用的统计函数, 输入通常是一个series的值, 或df的行, 列; 值得一提的是, pandas提供了缺失值处理, 在统计的时候, 不列入计算)

df = pd.DataFrame([

[1.4, np.nan],

[7.6, -4.5],

[np.nan, np.nan],

[3, -1.5]

],

index=list('abcd'), columns=['one', 'two'])

df

one

two

a

1.4

NaN

b

7.6

-4.5

c

NaN

NaN

d

3.0

-1.5

Calling DataFrame's sum method returns a Series containing column sums:

"默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值"

df.sum()

df.mean()

"在计算平均值时, NaN 不计入样本"

'默认axis=0, 行方向, 下方, 展示每列, 忽略缺失值'

one 12.0

two -6.0

dtype: float64

one 4.0

two -3.0

dtype: float64

'在计算平均值时, NaN 不计入样本'

Passing axis='columns' or axis=1 sums across the columns instead. -> axis方向

"按行统计, aixs=1, 列方向, 右边"

df.sum(axis=1)

'按行统计, aixs=1, 列方向, 右边'

a 1.4

b 3.1

c 0.0

d 1.5

dtype: float64

NA values are excluded unless the entire slice (row or column in the case) is NA. This can be disabled with the skipna option: -> 统计计算会自动忽略缺失值, 不计入样本

"默认是忽略缺失值的, 要缺失值, 则手动指定一下"

df.mean(skipna=False, axis='columns') # 列方向, 行哦

'默认是忽略缺失值的, 要缺失值, 则手动指定一下'

a NaN

b 1.55

c NaN

d 0.75

dtype: float64

See Table 5-7 for a list of common options for each reduction method.

Method

Description

axis

Axis to reduce over, 0 for DataFrame's rows and 1 for columns

skipna

Exclude missing values; True by default

level

Reduce grouped by level if the axis is hierachically indexed(MaltiIndex)

Some methods, like idmax and idmin, return indirect statistics like the index where the minimum or maximum values are attained(取得).

"idxmax() 返回最大值的第一个索引标签"

df.idxmax()

'idxmax() 返回最大值的第一个索引标签'

one b

two d

dtype: object

Other methods are accumulations: 累积求和-默认axis=0 行方向

"累积求和, 默认axis=0, 忽略NA"

df.cumsum()

"也可指定axis=1列方向"

df.cumsum(axis=1)

'累积求和, 默认axis=0, 忽略NA'

one

two

a

1.4

NaN

b

9.0

-4.5

c

NaN

NaN

d

12.0

-6.0

&#

最低0.47元/天解锁文章

weixin_39859394

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。