0.问题
Pandas 中describe在运行中不显示问题
如:阿里天池工业蒸汽量预测中对于数据进行表格化描述时,pycharm不能输出结果.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings("ignore")
# 读取数据
train_data_file = "./zhengqi_train.txt"
test_data_file = "./zhengqi_test.txt"
train_data = pd.read_csv(train_data_file, sep='\t', encoding='utf-8')
test_data = pd.read_csv(test_data_file, sep='\t', encoding='utf-8')
# 查看训练集特征变量信息
train_data.info()
# 查看数据统计信息
train_data.describe()
test_data.describe()
########################################################################################
# 输出结果
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2888 entries, 0 to 2887
Data columns (total 39 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 V0 2888 non-null float64
1 V1 2888 non-null float64
2 V2 2888 non-null float64
3 V3 2888 non-null float64
4 V4 2888 non-null float64
5 V5 2888 non-null float64
6 V6 2888 non-null float64
7 V7 2888 non-null float64
8 V8 2888 non-null float64
9 V9 2888 non-null float64
10 V10 2888 non-null float64
11 V11 2888 non-null float64
12 V12 2888 non-null float64
13 V13 2888 non-null float64
14 V14 2888 non-null float64
15 V15 2888 non-null float64
16 V16 2888 non-null float64
17 V17 2888 non-null float64
18 V18 2888 non-null float64
19 V19 2888 non-null float64
20 V20 2888 non-null float64
21 V21 2888 non-null float64
22 V22 2888 non-null float64
23 V23 2888 non-null float64
24 V24 2888 non-null float64
25 V25 2888 non-null float64
26 V26 2888 non-null float64
27 V27 2888 non-null float64
28 V28 2888 non-null float64
29 V29 2888 non-null float64
30 V30 2888 non-null float64
31 V31 2888 non-null float64
32 V32 2888 non-null float64
33 V33 2888 non-null float64
34 V34 2888 non-null float64
35 V35 2888 non-null float64
36 V36 2888 non-null float64
37 V37 2888 non-null float64
38 target 2888 non-null float64
dtypes: float64(39)
memory usage: 880.1 KB
Process finished with exit code 0
仅仅能够显示训练集合的特征信息,没有实现describe的功能.
1.解决方法
# 查看数据统计信息
print(train_data.describe()) #增加打印功能
print(test_data.describe())
#####################################################################################
#输出结果
V0 V1 ... V37 target
count 2888.000000 2888.000000 ... 2888.000000 2888.000000
mean 0.123048 0.056068 ... -0.130330 0.126353
std 0.928031 0.941515 ... 1.017196 0.983966
min -4.335000 -5.122000 ... -3.630000 -3.044000
25% -0.297000 -0.226250 ... -0.798250 -0.350250
50% 0.359000 0.272500 ... -0.185500 0.313000
75% 0.726000 0.599000 ... 0.495250 0.793250
max 2.121000 1.918000 ... 3.000000 2.538000
[8 rows x 39 columns]
V0 V1 ... V36 V37
count 1925.000000 1925.000000 ... 1925.000000 1925.000000
mean -0.184404 -0.083912 ... -0.046270 0.195735
std 1.073333 1.076670 ... 1.040854 0.940599
min -4.814000 -5.488000 ... -2.608000 -3.346000
25% -0.664000 -0.451000 ... -0.593000 -0.432000
50% 0.065000 0.195000 ... 0.083000 0.152000
75% 0.549000 0.589000 ... 0.651000 0.797000
max 2.100000 2.120000 ... 2.861000 3.021000
[8 rows x 38 columns]
2.知识点学习
2.1pandas
用于数据操纵和分析的软件库
conda install pandas #安装pandas包
conda remove pandas #卸载pandas包
2.2describe
Generate descriptive statistics.
Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN
values.Analyzes both numeric and object series, as well as DataFrame
column sets of mixed data types. The output will vary depending on what is provided.
生成描述性统计数据。 描述性统计包括那些总结数据集分布的集中趋势、分散和形状的统计,不包括 NaN 值。 分析数字和对象系列,以及混合数据类型的 DataFrame 列集。 输出将根据提供的内容而有所不同。