describe() matplotlib

最新推荐文章于 2024-03-20 11:00:25 发布

萌新待开发

最新推荐文章于 2024-03-20 11:00:25 发布

阅读量728

点赞数 1

分类专栏： ◍' Python '◍ 文章标签：机器学习 python

本文链接：https://blog.csdn.net/qq_44785318/article/details/113729935

版权

◍' Python '◍ 专栏收录该内容

30 篇文章 1 订阅

订阅专栏

describe

案例

describe(include=[‘O’])针对类目特征

matplotlib观察各特征之间的关系

制表分析

seaborn

describe

因为数值型数据有很多数学特征，所以专门针对数值型数据进行观察，可以观察的有：计数、平均值、标准差、四分位、最大值、最小值等等。本例可观察到的信息有：

max()
count()	计数
mean()	平均值
std()	标准差
min()	最小值
25%()	四分位
50%()
75%()

案例

train.info()

结果：
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

train['Age']

结果：
0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: Age, Length: 891, dtype: float64

train['Age'].describe()

结果：
count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: Age, dtype: float64

train['Age'].mean()

结果：
29.69911764705882

train['Age'].mean

结果：
<bound method NDFrame._add_numeric_operations.<locals>.mean of 0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: Age, Length: 891, dtype: float64>

describe(include=[‘O’])针对类目特征

train.describe(include=['O'])

结果：

                                         Name   Sex Ticket Cabin Embarked
count                                     891   891    891   204      889
unique                                    891     2    681   147        3
top     Abelson, Mrs. Samuel (Hannah Wizosky)  male   1601    G6        S
freq                                        1   577      7     4      644

matplotlib观察各特征之间的关系

查看数值型数据的数量分布

%matplotlib inline
import matplotlib.pyplot as plt

#figsize指每张图的尺寸大小（是一张图的长宽）
train.hist(figsize=(15,10))

结果：

array([[<AxesSubplot:title={'center':'PassengerId'}>,
        <AxesSubplot:title={'center':'Survived'}>,
        <AxesSubplot:title={'center':'Pclass'}>],
       [<AxesSubplot:title={'center':'Age'}>,
        <AxesSubplot:title={'center':'SibSp'}>,
        <AxesSubplot:title={'center':'Parch'}>],
       [<AxesSubplot:title={'center':'Fare'}>, <AxesSubplot:>,
        <AxesSubplot:>]], dtype=object)

能看出各年龄层的分布、花费的分布、带配偶的分布。

制表分析

#Pclass 查看阶层与生存的关系
train[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean().sort_values(by='Survived', ascending=False)

结果：

   Pclass  Survived
0       1  0.629630
1       2  0.472826
2       3  0.242363

#SibSp 兄弟姐妹、配偶与生存的关系
train[["SibSp", "Survived"]].groupby(['SibSp'],as_index=False).mean().sort_values(by='Survived',ascending=False)

结果：
   SibSp  Survived
1      1  0.535885
2      2  0.464286
0      0  0.345395
3      3  0.250000
4      4  0.166667
5      5  0.000000
6      8  0.000000

seaborn

#age 年龄与生存的关系
import seaborn as sns

#引入数据，布置横向画布
g = sns.FacetGrid(train, col='Survived') #col:跨列共享y轴"survived"，row参数是跨行共享x轴

g.map(plt.hist, 'Age', bins=20) #plt.hist = 柱状图，横轴 = Age ，bins是指每个图柱子的个数

结果：
<seaborn.axisgrid.FacetGrid at 0x7fe80ae9c190>

#age&Pclass 不同阶层不同年龄与生存的关系。将年龄和阶层进一步划分
grid = sns.FacetGrid(train, col="Survived", row="Pclass", size=2.2, aspect=1.6) #size:图形大小 aspect：图形的纵横比

grid.map(plt.hist, "Age", alpha=.5, bins=20)
grid.add_legend()

结果：
<seaborn.axisgrid.FacetGrid at 0x7fe80d1e2a10>

图形的很坐标均是age，第一个图的纵坐标是阶层为1没有生存的人数，第二个图的纵坐标是阶层为1生存的人数。等等。

观察到：

阶层3的人最多，但是死亡的人数也最多
阶层2和阶层3的婴儿生存率较高

#将类目特征和目标值关联
grid = sns.FacetGrid(train, row='Embarked', size=2.2, aspect=1.6)
grid.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')
grid.add_legend()

结果：
<seaborn.axisgrid.FacetGrid at 0x7fe80d3bcfd0>

观察到：

女性具有更好的生存率。但是在登陆港口为C的时候是个例外，在该港口登录的人中，男性具有更高的生存率，这可能是Pclas和Embarked之间的相关性。

萌新待开发

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
describe() matplotlib

因为数值型数据有很多数学特征，所以专门针对数值型数据进行观察，可以观察的有：计数、平均值、标准差、四分位、最大值、最小值等等。本例可观察到的信息有：count mean std min 25% ...
复制链接

扫一扫