pandas学习总结笔记：速获取统计摘要 df.describe()

最新推荐文章于 2024-04-22 08:00:00 发布

qq_41183513

最新推荐文章于 2024-04-22 08:00:00 发布

阅读量1.6k

点赞数 1

文章标签： python 数据挖掘数据分析

本文链接：https://blog.csdn.net/qq_41183513/article/details/121650256

版权

速获取统计摘要 df.describe()

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(6, 4),columns=list('ABCD'))
print(df)

[out]:
          A         B         C         D
0  0.262919 -1.892326  0.877485 -0.665933
1  0.940333  0.135307 -0.604675 -1.544919
2  1.625278  0.784647 -0.477392  1.463082
3  0.538317 -0.457268  0.871012 -0.070698
4 -1.276471 -0.856285  1.071984 -1.914982
5 -1.044564 -0.789176  0.898808 -0.581169

# df.describe() 
# 统计数字
print(df.describe())

[out]:
              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean   0.174302 -0.512517  0.439537 -0.552436
std    1.133391  0.916946  0.764202  1.195797
min   -1.276471 -1.892326 -0.604675 -1.914982
25%   -0.717693 -0.839507 -0.140291 -1.325173
50%    0.400618 -0.623222  0.874249 -0.623551
75%    0.839829 -0.012836  0.893478 -0.198316
max    1.625278  0.784647  1.071984  1.463082

count：数量统计，此列共有多少有效值
unipue：不同的值有多少个
std：标准差
min：最小值
25%：四分之一分位数
50%：二分之一分位数
75%：四分之三分位数
max：最大值
mean：均值

# 统计字符串
df = pd.Series(['a', 'a', 'b', 'a', 'b', 'b', 'b'])
print(df)
print(df.describe())

[out]:
dtype: object
count     7
unique    2
top       b
freq      4

top：出现最多的值
freq：出现频率

# 描述一个DataFrame. 默认情况下只返回数字字段
df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
                   'numeric': [1, 2, 3],
                   'object': ['a', 'b', 'c']
                  })

print(df)
print(df.describe())

[out]:
  categorical  numeric object
0           d        1      a
1           e        2      b
2           f        3      c

       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

# 统计所有的列，不是数字的返回NaN
df.describe(include='all')
# 描述特定的列
df.numeric.describe()
# 包含数字列
df.describe(include=[np.number])
# 描述中仅包含字符串列
df.describe(include=[object])
# 仅包含分类列
df.describe(include=['category'])
# 排除数字列
df.describe(exclude=[np.number])  
# 排除对象列
df.describe(exclude=[object])