python的describe参数_python pandas DataFrame.describe用法及代码示例

最新推荐文章于 2024-03-04 20:41:00 发布

weixin_39776298

最新推荐文章于 2024-03-04 20:41:00 发布

阅读量1.5k

点赞数 1

文章标签： python的describe参数

本文链接：https://blog.csdn.net/weixin_39776298/article/details/111742747

版权

生成描述性统计信息。

描述性统计数据包括总结数据集分布的集中趋势，离散度和形状的统计数据，但不包括NaN值。

分析数值和对象序列，以及DataFrame混合数据类型的列集。输出将根据提供的内容而有所不同。有关更多详细信息，请参阅以下注释。

参数：

percentiles：list-like of numbers, 可选参数要包含在输出中的百分比。全部应介于0和1之间。默认值为[.25, .5, .75]，返回第25、50和75个百分位数。

include：‘all’, list-like of dtypes 或 None (default), 可选参数要包含在结果中的数据类型的白名单。忽略了Series。以下是选项：

‘all’：输入的所有列都将包含在输出中。

dtypes的list-like：将结果限制为提供的数据类型。限制结果为数字类型提交numpy.number。要将其限制为对象列，请提交numpy.object数据类型。字符串也可以以select_dtypes(例如。df.describe(include=['O']))。要选择 pandas 分类列，请使用'category'

无(默认)：结果将包括所有数字列。

exclude：list-like of dtypes 或 None (default), optional,要从结果中忽略的数据类型黑名单。忽略了Series。以下是选项：

dtypes的list-like：从结果中排除提供的数据类型。排除数字类型提交numpy.number。要排除对象列，请提交数据类型numpy.object。字符串也可以以select_dtypes(例如。df.describe(include=['O']))。要排除 pandas 分类列，请使用'category'

无(默认)：结果将不排除任何内容。

返回值：

Series提供的 Series 或 DataFrame 的摘要统计信息。

注意：

对于数字数据，结果的索引将包括count，mean，std，min，max以及更低50和较高的百分位数。默认情况下，较低的百分位数是25而较高的百分位数是75。的50百分位数与中位数相同。

对于对象数据(例如字符串或时间戳记)，结果的索引将包括count，unique，top和freq。的top是最常见的价值。的freq是最常见的值的频率。时间戳记还包括first和last项目。

如果多个对象值的计数最高，则count和top结果将从计数最高的那些中任意选择。

对于通过DataFrame，默认值为仅返回对数字列的分析。如果 DataFrame 仅由对象和分类数据组成，而没有任何数字列，则默认值为返回对对象和分类列的分析。如果include='all'作为选项提供，结果将包括每种类型的属性的并集。

的include和exclude参数可用于限制DataFrame分析输出。分析参数时将忽略这些参数Series。

例子：

描述一个数字Series。

>>> s = pd.Series([1, 2, 3])

>>> s.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

dtype:float64

描述一个分类Series。

>>> s = pd.Series(['a', 'a', 'b', 'c'])

>>> s.describe()

count 4

unique 3

top a

freq 2

dtype:object

描述时间戳Series。

>>> s = pd.Series([

... np.datetime64("2000-01-01"),

... np.datetime64("2010-01-01"),

... np.datetime64("2010-01-01")

... ])

>>> s.describe()

count 3

unique 2

top 2010-01-01 00:00:00

freq 2

first 2000-01-01 00:00:00

last 2010-01-01 00:00:00

dtype:object

描述一个DataFrame。默认情况下，仅返回数字字段。

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),

... 'numeric': [1, 2, 3],

... 'object': ['a', 'b', 'c']

... })

>>> df.describe()

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

描述一个的所有列DataFrame无论数据类型如何。

>>> df.describe(include='all')

categorical numeric object

count 3 3.0 3

unique 3 NaN 3

top f NaN c

freq 1 NaN 1

mean NaN 2.0 NaN

std NaN 1.0 NaN

min NaN 1.0 NaN

25% NaN 1.5 NaN

50% NaN 2.0 NaN

75% NaN 2.5 NaN

max NaN 3.0 NaN

描述一个列DataFrame通过将其作为属性进行访问。

>>> df.numeric.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

Name:numeric, dtype:float64

仅在DataFrame描述。

>>> df.describe(include=[np.number])

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

仅在字符串中包含字符串列DataFrame描述。

>>> df.describe(include=[np.object])

object

count 3

unique 3

top c

freq 1

仅包括来自DataFrame描述。

>>> df.describe(include=['category'])

categorical

count 3

unique 3

top f

freq 1

从中排除数字列DataFrame描述。

>>> df.describe(exclude=[np.number])

categorical object

count 3 3

unique 3 3

top f c

freq 1 1

从一个对象中排除对象列DataFrame描述。

>>> df.describe(exclude=[np.object])

categorical numeric

count 3 3.0

unique 3 NaN

top f NaN

freq 1 NaN

mean NaN 2.0

std NaN 1.0

min NaN 1.0

25% NaN 1.5

50% NaN 2.0

75% NaN 2.5

max NaN 3.0

weixin_39776298

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python的describe参数_python pandas DataFrame.describe用法及代码示例

生成描述性统计信息。描述性统计数据包括总结数据集分布的集中趋势，离散度和形状的统计数据，但不包括NaN值。分析数值和对象序列，以及DataFrame混合数据类型的列集。输出将根据提供的内容而有所不同。有关更多详细信息，请参阅以下注释。参数：percentiles：list-like of numbers, 可选参数要包含在输出中的百分比。全部应介于0和1之间。默认值为[.25, .5, .75]，...
复制链接

扫一扫