seaborn中的kdeplot、rugplot、distplot

最新推荐文章于 2024-06-13 22:40:45 发布

小瓶盖的猪猪侠

最新推荐文章于 2024-06-13 22:40:45 发布

阅读量2.4k

点赞数 1

文章标签：数据分析

本文链接：https://blog.csdn.net/qq_29983883/article/details/116355055

版权

kdeplot

def kdeplot(
    x=None,  # Allow positional x, because behavior will not change with reorg
    *,
    y=None,
    shade=None,  # Note "soft" deprecation, explained below
    vertical=False,  # Deprecated
    kernel=None,  # Deprecated
    bw=None,  # Deprecated
    gridsize=200,  # TODO maybe depend on uni/bivariate?
    cut=3, clip=None, legend=True, cumulative=False,
    shade_lowest=None,  # Deprecated, controlled with levels now
    cbar=False, cbar_ax=None, cbar_kws=None,
    ax=None,

    # New params
    weights=None,  # TODO note that weights is grouped with semantics
    hue=None, palette=None, hue_order=None, hue_norm=None,
    multiple="layer", common_norm=True, common_grid=False,
    levels=10, thresh=.05,
    bw_method="scott", bw_adjust=1, log_scale=None,
    color=None, fill=None,

    # Renamed params
    data=None, data2=None,

    **kwargs,
):

data：一维数组，单变量时作为唯一的变量
data2：格式同data2，单变量时不输入，双变量作为第2个输入变量
shade：bool型变量，用于控制是否对核密度估计曲线下的面积进行色彩填充，True代表填充
vertical：bool型变量，在单变量输入时有效，用于控制是否颠倒x-y轴位置
kernel：字符型输入，用于控制核密度估计的方法，默认为’gau’，即高斯核，特别地在2维变量的情况下仅支持高斯核方法
legend：bool型变量，用于控制是否在图像上添加图例
cumulative：bool型变量，用于控制是否绘制核密度估计的累计分布，默认为False
shade_lowest：bool型变量，用于控制是否为核密度估计中最低的范围着色，主要用于在同一个坐标轴中比较多个不同分布总体，默认为True
cbar：bool型变量，用于控制是否在绘制二维核密度估计图时在图像右侧边添加比色卡
color：字符型变量，用于控制核密度曲线色彩，同plt.plot()中的color参数，如’r’代表红色
cmap：字符型变量，用于控制核密度区域的递进色彩方案，同plt.plot()中的cmap参数，如’Blues’代表蓝色系
n_levels：int型，在而为变量时有效，用于控制核密度估计的区间个数，反映在图像上的闭环层数

下面我们来看几个示例来熟悉kdeplot中上述参数的实际使用方法：

import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
#加载seaborn自带的鸢尾花数据集，格式为数据框
iris = sns.load_dataset('iris')
print(iris.columns)
#分离出setosa类的花对应的属性值
setosa = iris.loc[iris.species == "setosa"].reset_index(drop=True)
#分离出virginica类的花对应的属性值
virginica = iris.loc[iris.species == "virginica"].reset_index(drop=True)

ax = sns.kdeplot(iris.petal_width)

在这里插入图片描述

ax = sns.kdeplot(iris.petal_width,shade=True,color='r')

在这里插入图片描述
修改为核密度分布：

ax = sns.kdeplot(iris.petal_width,
                 shade=True,
                 color='r',
                 cumulative=True)

在这里插入图片描述
交换x-y轴位置：

ax = sns.kdeplot(iris.petal_width,
                 shade=True,
                 color='r',
                 vertical=True)

在这里插入图片描述

rugplot

rugplot的功能非常朴素，用于绘制出一维数组中数据点实际的分布位置情况，即不添加任何数学意义上的拟合，单纯的将记录值在坐标轴上表现出来，相对于kdeplot，其可以展示原始的数据离散分布情况，其主要参数如下：

def rugplot(
    x=None,  # Allow positional x, because behavior won't change
    *,
    height=.025, axis=None, ax=None,

    # New parameters
    data=None, y=None, hue=None,
    palette=None, hue_order=None, hue_norm=None,
    expand_margins=True,
    legend=True,  # TODO or maybe default to False?

    # Renamed parameter
    a=None,

    **kwargs
):

a：一维数组，传入观测值向量
height：设置每个观测点对应的小短条的高度，默认为0.05
axis：字符型变量，观测值对应小短条所在的轴，默认为’x’，即x轴

import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
#加载seaborn自带的鸢尾花数据集，格式为数据框
iris = sns.load_dataset('iris')
print(iris.columns)
#分离出setosa类的花对应的属性值
setosa = iris.loc[iris.species == "setosa"].reset_index(drop=True)
#分离出virginica类的花对应的属性值
virginica = iris.loc[iris.species == "virginica"].reset_index(drop=True)
ax = sns.rugplot(iris.petal_length)

在这里插入图片描述

修改小短条高度和颜色：

ax = sns.rugplot(iris.petal_length,
                 color='r',
                 height=0.2)

在这里插入图片描述

distplot

seaborn中的distplot主要功能是绘制单变量的直方图，且还可以在直方图的基础上施加kdeplot和rugplot的部分内容，是一个功能非常强大且实用的函数，其主要参数如下：

def distplot(a=None, bins=None, hist=True, kde=True, rug=False, fit=None,
             hist_kws=None, kde_kws=None, rug_kws=None, fit_kws=None,
             color=None, vertical=False, norm_hist=False, axlabel=None,
             label=None, ax=None, x=None):

a：一维数组形式，传入待分析的单个变量
bins：int型变量，用于确定直方图中显示直方的数量，默认为None，这时bins的具体个数由Freedman-Diaconis准则来确定
hist：bool型变量，控制是否绘制直方图，默认为True
kde：bool型变量，控制是否绘制核密度估计曲线，默认为True
rug：bool型变量，控制是否绘制对应rugplot的部分，默认为False
fit：传入scipy.stats中的分布类型，用于在观察变量上抽取相关统计特征来强行拟合指定的分布，下文的例子中会有具体说明，默认为None，即不进行拟合
hist_kws,kde_kws,rug_kws：这几个变量都接受字典形式的输入，键值对分别对应各自原生函数中的参数名称与参数值，在下文中会有示例
color：用于控制除了fit部分拟合出的曲线之外的所有对象的色彩
vertical：bool型，控制是否颠倒x-y轴，默认为False，即不颠倒
norm_hist：bool型变量，用于控制直方图高度代表的意义，为True直方图高度表示对应的密度，为False时代表的是对应的直方区间内记录值个数，默认为Fals
label：控制图像中的图例标签显示内容

import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
#加载seaborn自带的鸢尾花数据集，格式为数据框
iris = sns.load_dataset('iris')
print(iris.columns)
#分离出setosa类的花对应的属性值
setosa = iris.loc[iris.species == "setosa"].reset_index(drop=True)
#分离出virginica类的花对应的属性值
virginica = iris.loc[iris.species == "virginica"].reset_index(drop=True)
ax = sns.distplot(iris.petal_length)

在这里插入图片描述

修改所有对象的颜色，绘制rugplot部分，并修改bins为20：

ax = sns.distplot(iris.petal_length,color='r',
                 rug=True,
                 bins=20)

在这里插入图片描述

在上图的基础上强行拟合卡方分布并利用参数字典设置fit曲线为绿色：

from scipy.stats import chi2                
ax = sns.distplot(iris.petal_length,color='r',
                 rug=True,
                 bins=20,
                 fit=chi2,
                 fit_kws={'color':'g'})

在这里插入图片描述
修改norm_hist参数为False使得纵轴显示的不再是密度而是频数（注意这里必须关闭kde和fit绘图的部分，否则纵轴依然显示密度），利用hist_kws传入字典调整直方图部分色彩和透明度，利用rug_kws传入字典调整rugplot部分小短条色彩：

ax = sns.distplot(iris.petal_length,color='r',
                 rug=True,
                 kde=False,
                 bins=20,
                 fit=None,
                 hist_kws={'alpha':0.6,'color':'orange'},
                 rug_kws={'color':'g'},
                 norm_hist=False)

在这里插入图片描述

#参考
参考文

小瓶盖的猪猪侠

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
seaborn中的kdeplot、rugplot、distplot

kdeplotdef kdeplot( x=None, # Allow positional x, because behavior will not change with reorg *, y=None, shade=None, # Note "soft" deprecation, explained below vertical=False, # Deprecated kernel=None, # Deprecated bw=None,
复制链接

扫一扫