Python数据可视化 | 7、Seaborn如何让分布更直观

最新推荐文章于 2023-01-14 22:15:20 发布

AI阅读和图谱

最新推荐文章于 2023-01-14 22:15:20 发布

阅读量291

点赞数 1

分类专栏：人工智能 python

本文链接：https://blog.csdn.net/qq_34740277/article/details/119887521

版权

python 同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

人工智能

8 篇文章 0 订阅

订阅专栏

%matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats, integrate
from warnings import filterwarnings
filterwarnings('ignore')
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(color_codes=True)
np.random.seed(sum(map(ord, "distributions")))

单变量分布

灰度图

最方便快捷的方式~

x = np.random.normal(size=100)
sns.distplot(x, kde=True)
# 核密度估计kde是默认为True的

在这里插入图片描述

# 想得到更精细的刻画？
# 调节bins!
sns.distplot(x, kde=False, bins=30)
# bins=30 三十个柱子！

在这里插入图片描述

# 想配合着实例一起看？
sns.distplot(x, kde=False, bins=30, rug=True)
# rug 控制是否显示观测的小细条（边际毛毯）
# Whether to draw a rugplot on the support axis.

在这里插入图片描述

配合着实例一起看有什么好处？
答：指导你设置合适的bins。

注：上面的kde参数的开启与否是存在默认的带宽的，大概0.3左右。

核密度估计(KDE)

通过观测估计概率密度函数的形状。有什么用呢？待定系数法求概率密度函数~

核密度估计的步骤：

每一个观测附近用一个正态分布曲线近似
叠加所有观测的正态分布曲线
归一化

在seaborn中怎么画呢？

sns.kdeplot(x)

在这里插入图片描述

bandwidth 的概念：用于近似的正态分布曲线的宽度
bandwidth 越大，曲线越平缓

sns.kdeplot(x, label = "bw: 'scott'")
sns.kdeplot(x, bw=.2, label="bw: 0.2")
sns.kdeplot(x, bw=2, label="bw: 2") # 过于平滑
plt.legend()

在这里插入图片描述

模型参数拟合

x = np.random.gamma(6, size=200)            # 一个gamma分布
sns.distplot(x, 
             kde=True, 
             fit=stats.gamma
            )  # 我们尝试性的猜是gamma函数

在这里插入图片描述

蓝色线是 sns.distplot(x) 所绘制的结果
黑色线是 sns.distplot(x, fit=stats.gamma) 所绘制的结果

双变量分布

mean, cov = [0, 1], [(1, 0.5), (0.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200) 
# np.random.multivariate_normal() 多元正态分布，依据指定的均值和协方差生成数据
# 均值分别为0和1，方差都是1，点与点之间还有0.5的相关系数
df = pd.DataFrame(data, columns=["x", "y"])
df.head()

在这里插入图片描述
两个相关的正态分布~

散点图

对于两个相关的分布，有牛逼的 sns.jointplot() 函数可以利用：

sns.jointplot(x="x", y="y", data=df).annotate(stats.pearsonr)

在这里插入图片描述
图中信息：x与y散点图／x和y灰度图／personr相关性系数／p value抽样误差（p越小越好）

关于皮尔逊相关系数（Pearson Correlation Coefficient）
- 相关链接：Discussion of Similarity Metrics，zh.wikipedia.org
- pearsonr相关系数计算：

$\rho_{X,Y} = \frac{cov(X, Y)}{\sigma_X\sigma_Y}$

简单的相关系数的分类：
- 0.8-1.0 极强相关
- 0.6-0.8 强相关
- 0.4-0.6 中等程度相关
- 0.2-0.4 弱相关
- 0.0-0.2 极弱相关或无相关
Pearson相关系数游戏：http://guessthecorrelation.com

六角箱图

x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("ticks"):
    sns.jointplot(x=x, y=y, kind="hex").annotate(stats.pearsonr)
    # 可以指定什么形状（hex六角形）
# np.random.multivariate_normal(mean, cov, 10).T

在这里插入图片描述

核密度估计

# 等高线型
sns.jointplot(x="x", y="y", data=df, kind="kde").annotate(stats.pearsonr)

在这里插入图片描述

f, ax = plt.subplots(figsize=(8, 8)) #  axes
sns.kdeplot(df.x, df.y, ax=ax, shade=False)
# shade=False 不要填充，不然成等高线
sns.rugplot(df.x, color="b", ax=ax) 
sns.rugplot(df.y, vertical=True, ax=ax, color="r") 
# sns.rugplot 专门画rug ; vertical 水平化

在这里插入图片描述
想看到更连续梦幻的效果~

f, ax = plt.subplots(figsize=(6, 6))
# cubehelix颜色系统，亮度正比于强度，用于天文学图像绘制。http://www.mrao.cam.ac.uk/~dag/CUBEHELIX/
cmap = sns.cubehelix_palette(as_cmap=True, dark=1, light=0) 
# cmap: color map 颜色映射
sns.kdeplot(df.x, df.y, cmap=cmap, n_levels=60, shade=True)

在这里插入图片描述

g = sns.jointplot(x="x", y="y", data=df, kind="kde", color="m")
g.plot_joint(plt.scatter, c="w", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0) # 设置中间图片背景的透明度
g.set_axis_labels("$X$", "$Y$") # Latex

在这里插入图片描述
注：关于kde图，一维的主要是猜分布的，二维的若能看出有好几个中心，就可以做聚类相关的工作。

数据集中的两两关系

iris =  pd.read_csv("iris.csv") # 鸢尾花数据库
iris.head()

在这里插入图片描述

sns.pairplot(iris)  # 默认对角线hist，非对角线scatter

在这里插入图片描述
属性两两间的关系 + 属性的灰度图

g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot) # 对角线单个属性图
g.map_offdiag(sns.kdeplot, cmap="Blues_d", n_levels=20) # 非对角线两个属性关系图

在这里插入图片描述

小结

distplot(bins, rug)
kdeplot(bw, fit)
joinplot(kind)
pairplot
源码获取：关注微信公众号“AI阅读知识图谱”，回复“Python数据可视化”获取已更新内容全部代码。

AI阅读和图谱

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Python数据可视化 | 7、Seaborn如何让分布更直观

目录单变量分布灰度图核密度估计(KDE)模型参数拟合双变量分布散点图六角箱图核密度估计数据集中的两两关系小结%matplotlib inlineimport numpy as npimport pandas as pdfrom scipy import stats, integratefrom warnings import filterwarningsfilterwarnings('ignore')import matplotlib as mplimport matplotlib.pypl
复制链接

扫一扫

专栏目录