Pandas中的info()函数与describe()函数

最新推荐文章于 2025-03-03 05:30:33 发布

ac同学

最新推荐文章于 2025-03-03 05:30:33 发布

阅读量4.3w

点赞数 35

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/qq_40305043/article/details/104862499

版权

python 专栏收录该内容

16 篇文章

订阅专栏

本文深入解析Pandas库中的info()和describe()函数，前者用于展示DataFrame的概览信息，包括数据类型、非空值数量和内存使用情况；后者则生成描述性统计信息，涵盖均值、标准差、分位数等数值统计，以及类别统计如个数、类别数目等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对于这两个函数，我首先抛出官网的解释：info()函数和describe()函数

$\color{red}{1.\,\,\,info()函数}$

info()函数用于打印DataFrame的简要摘要，显示有关DataFrame的信息，包括索引的数据类型dtype和列的数据类型dtype，非空值的数量和内存使用情况。

1.1 info()函数参数介绍

DataFrame.info (self, verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Parameters	Value
self	self只有在类的方法中才会有，其他函数或方法是不必带self的。有关self的更多内容，指路 $\to$ https://www.cnblogs.com/huangbiquan/p/7741016.html
verbose：bool, optional	“verbose”中文译为“冗长的”，该参数决定是否打印完整的摘要。如果为True，显示所有列的信息；如果为False，那么会省略一部分。默认情况下，遵循pandas.options.display.max_info_columns中的设置。
buf：writable buffer, defaults to sys.stdout	该参数决定将输出发送到哪里。默认情况下，输出打印到sys.stdout。如果需要进一步处理输出，请传递可写缓冲区。可将DataFrame.info()存储为变量，指路 $\to$ https://blog.csdn.net/qq_34105362/article/details/90056765。
max_col：sint, optional	该参数使得从“详细输出”转换为“缩减输出”，如果DataFrame的列数超过max_cols，则缩减输出。默认情况下，使用pandas.options.display.max_info_columns中的设置。
memory_usage：bool, str, optional	该参数决定是否应显示DataFrame元素（包括索引）的总内存使用情况。默认情况下为True。 True始终显示内存使用情况；False永远不会显示内存使用情况。
null_counts：bool, optional	该参数决定是否显示非空计数。值为True始终显示计数，而值为False则不显示计数。默认情况下，仅当Dataframe小于pandas.options.display.max_info_rows和pandas.options.display.max_info_columns时才显示。

1.2 info()函数举例

#（1）定义一个Dataframe
int_values = [1, 2, 3, 4, 5]
text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
                  "float_col": float_values})
df

Output：

int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

#（2）利用info()函数
df.info(verbose=True)

Output：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
int_col      5 non-null int64
text_col     5 non-null object
float_col    5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

$\color{red}{2.\,\,\,describe()函数}$

describe()函数用于生成描述性统计信息。描述性统计数据：数值类型的包括均值，标准差，最大值，最小值，分位数等；类别的包括个数，类别的数目，最高数量的类别及出现次数等；输出将根据提供的内容而有所不同。

2.1 describe()函数参数介绍

DataFrame.describe (self: ~FrameOrSeries, percentiles=None, include=None, exclude=None)

项目	Value
percentiles：list-like of numbers, optional	该参数决定要包含在输出中的百分位数。所有值都应介于0和1之间。默认值为[.25，.5，.75]，它返回第25、50和75个百分位数。
include：‘all’, list-like of dtypes or None (default), optional	该参数决定要包含在结果中的数据类型的白名单。‘all’：所有列将包含在输出中。 dtypes的列表：将结果限制为提供的数据类型。默认情况下，结果将包括所有数字列。
exclude：list-like of dtypes or None (default), optional,	该参数决定要从结果中忽略的数据类型的黑名单。dtypes的列表：从结果中排除提供的数据类型。默认情况下，结果将不排除任何内容。

2.2 info()函数举例

2.2.1 Describing a numeric Series.

s = pd.Series([1, 2, 3])
s.describe()

Output：

count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

2.2.2 Describing a categorical Series.

s = pd.Series(['a', 'a', 'b', 'c'])
s.describe()

Output：

count     4
unique    3
top       a
freq      2
dtype: object

2.2.3 Describing a DataFrame. By default only numeric fields are returned.

df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
                   'numeric': [1, 2, 3],
                   'object': ['a', 'b', 'c']
                  })
df.describe()

Output：

        numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

2.2.4 Describing all columns of a DataFrame regardless of data type.

df.describe(include='all')

Output：

          categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      c
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN