Python数据可视化--Seaborn绘图总结2

最新推荐文章于 2024-01-23 21:05:30 发布

北山啦

最新推荐文章于 2024-01-23 21:05:30 发布

阅读量3.7k

点赞数 11

分类专栏： Visualization Python 文章标签： python 数据可视化

本文链接：https://blog.csdn.net/qq_45176548/article/details/117305614

版权

Python 同时被 2 个专栏收录

70 篇文章 51 订阅

订阅专栏

Visualization

24 篇文章 30 订阅

订阅专栏

Python 数据可视化–Seaborn绘图总结2

Seaborn其实是在matplotlib的基础上进行了更高级的API封装，从而使得作图更加容易。同时它能高度兼容numpy与pandas数据结构以及scipy与statsmodels等统计模式

在这里插入图片描述
reference

文章目录

类型

类型

在这里插入图片描述

Relational plots 关系类图表
1. relplot() 关系类图表的接口，其实是下面两种图的集成，通过指定kind参数可以画出下面的两种图
2. scatterplot() 散点图
3. lineplot() 折线图
Categorical plots 分类图表
1. catplot() 分类图表的接口，其实是下面八种图表的集成，通过指定kind参数可以画出下面的八种图
2. stripplot() 分类散点图
3. swarmplot() 能够显示分布密度的分类散点图
4. boxplot() 箱图
5. violinplot() 小提琴图
6. boxenplot() 增强箱图
7. pointplot() 点图
8. barplot() 条形图
9. countplot() 计数图
Distribution plot 分布图
1. jointplot() 双变量关系图
2. pairplot() 变量关系组图
3. distplot() 直方图，质量估计图
4. kdeplot() 核函数密度估计图
5. rugplot() 将数组中的数据点绘制为轴上的数据
Regression plots 回归图
1. lmplot() 回归模型图
2. regplot() 线性回归图
3. residplot() 线性回归残差图
Matrix plots 矩阵图
1. heatmap() 热力图
2. clustermap() 聚集图

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jieba

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

有一套的参数可以控制绘图元素的比例。

首先，让我们通过set()重置默认的参数：

有五种seaborn的风格，它们分别是：darkgrid, whitegrid, dark, white, ticks。它们各自适合不同的应用和个人喜好。默认的主题是darkgrid。

sns.set(style="ticks")

boxplot

箱形图（Box-plot）又称为盒须图、盒式图或箱线图，是一种用作显示一组数据分散情况资料的统计图。它能显示出一组数据的最大值、最小值、中位数及上下四分位数。

"""
Grouped boxplots
================
"""
sns.set(style="ticks", palette="pastel")

# Load the example tips dataset
tips = pd.read_csv("./seaborn-data-master/tips.csv")

# Draw a nested boxplot to show bills by day and time
sns.boxplot(x="day", y="total_bill",
            hue="smoker", palette=["m", "g"],
            data=tips)
sns.despine(offset=10, trim=True)

在这里插入图片描述

violinplot

violinplot与boxplot扮演类似的角色，它显示了定量数据在一个（或多个）分类变量的多个层次上的分布，这些分布可以进行比较。不像箱形图中所有绘图组件都对应于实际数据点，小提琴绘图以基础分布的核密度估计为特征。

"""
Violinplots with observations
=============================

"""


sns.set()

# Create a random dataset across several variables
rs = np.random.RandomState(0)
n, p = 40, 8
d = rs.normal(0, 2, (n, p))
d += np.log(np.arange(1, p + 1)) * -5 + 10

# Use cubehelix to get a custom sequential palette
pal = sns.cubehelix_palette(p, rot=-.5, dark=.3)

# Show each distribution with both violins and points
sns.violinplot(data=d, palette=pal, inner="points")

<AxesSubplot:>

在这里插入图片描述

"""
Grouped violinplots with split violins
======================================
"""
sns.set(style="whitegrid", palette="pastel", color_codes=True)

# Load the example tips dataset
tips = pd.read_csv("./seaborn-data-master/tips.csv")

# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x="day", y="total_bill", hue="smoker",
               split=True, inner="quart",
               palette={"Yes": "y", "No": "b"},
               data=tips)
sns.despine(left=True)

在这里插入图片描述

"""
Violinplot from a wide-form dataset
===================================

"""

sns.set(style="whitegrid")

# Load the example dataset of brain network correlations
df = pd.read_csv("./seaborn-data-master/brain_networks.csv", header=[0, 1, 2], index_col=0)

# Pull out a specific subset of networks
used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
df = df.loc[:, used_columns]

# Compute the correlation matrix and average over networks
corr_df = df.corr().groupby(level="network").mean()
corr_df.index = corr_df.index.astype(int)
corr_df = corr_df.sort_index().T

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 6))

# Draw a violinplot with a narrower bandwidth than the default
sns.violinplot(data=corr_df, palette="Set3", bw=.2, cut=1, linewidth=1)

# Finalize the figure
ax.set(ylim=(-.7, 1.05))
sns.despine(left=True, bottom=True)

在这里插入图片描述

heatmap

热力图
利用热力图可以看数据表里多个特征两两的相似度。

"""
Annotated heatmaps
==================

"""
sns.set()

# Load the example flights dataset and conver to long-form
flights_long = pd.read_csv("./seaborn-data-master/flights.csv")
flights = flights_long.pivot("month", "year", "passengers")

# Draw a heatmap with the numeric values in each cell
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)

<AxesSubplot:xlabel='year', ylabel='month'>

在这里插入图片描述

"""
Plotting a diagonal correlation matrix
======================================

"""
from string import ascii_letters

sns.set(style="white")

# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 26)),
                 columns=list(ascii_letters[26:]))

# Compute the correlation matrix
corr = d.corr()

# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

在这里插入图片描述

jointplot

用于2个变量的画图

"""
Joint kernel density estimate
=============================
"""
sns.set(style="white")

# Generate a random correlated bivariate dataset
rs = np.random.RandomState(5)
mean = [0, 0]
cov = [(1, .5), (.5, 1)]
x1, x2 = rs.multivariate_normal(mean, cov, 500).T
x1 = pd.Series(x1, name="$X_1$")
x2 = pd.Series(x2, name="$X_2$")

# Show the joint distribution using kernel density estimation
g = sns.jointplot(x1, x2, kind="kde", height=7, space=0)

在这里插入图片描述

HexBin图

直方图的双变量类似物被称为“hexbin”图，因为它显示了落在六边形仓内的观测数。该图适用于较大的数据集。

"""
Hexbin plot with marginal distributions
=======================================
"""
sns.set(style="ticks")

rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)

sns.jointplot(x, y, kind="hex", color="#4CB391")

在这里插入图片描述

"""
Linear regression with marginal distributions
=============================================

"""

sns.set(style="darkgrid")

tips = pd.read_csv("./seaborn-data-master/tips.csv")
g = sns.jointplot("total_bill", "tip", data=tips, kind="reg",
                  xlim=(0, 60), ylim=(0, 12), color="m", height=7)

在这里插入图片描述

barplot

条形图表示数值变量与每个矩形高度的中心趋势的估计值，并使用误差线提供关于该估计值附近的不确定性的一些指示。

"""
Horizontal bar plots
====================
"""
sns.set(style="whitegrid")

# Load the example car crash dataset
crashes = pd.read_csv("./seaborn-data-master/car_crashes.csv").sort_values("total", ascending=False)

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(6, 15))
# Plot the total crashes
sns.set_color_codes("pastel")
sns.barplot(x="total", y="abbrev", data=crashes,
            label="Total", color="b")

# Plot the crashes where alcohol was involved
sns.set_color_codes("muted")
sns.barplot(x="alcohol", y="abbrev", data=crashes,
            label="Alcohol-involved", color="b")

# Add a legend and informative axis label
ax.legend(ncol=2, loc="lower right", frameon=True)
ax.set(xlim=(0, 24), ylabel="",
       xlabel="Automobile collisions per billion miles")
sns.despine(left=True, bottom=True)

在这里插入图片描述

catplot

分类图表的接口，通过指定kind参数可以画出下面的八种图

stripplot() 分类散点图

swarmplot() 能够显示分布密度的分类散点图

boxplot() 箱图

violinplot() 小提琴图

boxenplot() 增强箱图

pointplot() 点图

barplot() 条形图

countplot() 计数图

"""
Grouped barplots
================
"""
sns.set(style="whitegrid")

# Load the example Titanic dataset
titanic = pd.read_csv("./seaborn-data-master/titanic.csv")

# Draw a nested barplot to show survival for class and sex
g = sns.catplot(x="class", y="survived", hue="sex", data=titanic,
                height=6, kind="bar", palette="muted")
g.despine(left=True)
g.set_ylabels("survival probability")

<seaborn.axisgrid.FacetGrid at 0x2c7be6f7e20>

在这里插入图片描述

"""
Plotting a three-way ANOVA
==========================

"""

sns.set(style="whitegrid")

# Load the example exercise dataset
df = pd.read_csv("./seaborn-data-master/exercise.csv")

# Draw a pointplot to show pulse as a function of three categorical factors
g = sns.catplot(x="time", y="pulse", hue="kind", col="diet",
                capsize=.2, palette="YlGnBu_d", height=6, aspect=.75,
                kind="point", data=df)
g.despine(left=True)

在这里插入图片描述

pointplot

点图

"""
Conditional means with observations
===================================

"""
sns.set(style="whitegrid")
iris = pd.read_csv("./seaborn-data-master/iris.csv")

# "Melt" the dataset to "long-form" or "tidy" representation
iris = pd.melt(iris, "species", var_name="measurement")

# Initialize the figure
f, ax = plt.subplots()
sns.despine(bottom=True, left=True)

# Show each observation with a scatterplot
sns.stripplot(x="value", y="measurement", hue="species",
              data=iris, dodge=True, jitter=True,
              alpha=.25, zorder=1)

# Show the conditional means
sns.pointplot(x="value", y="measurement", hue="species",
              data=iris, dodge=.532, join=False, palette="dark",
              markers="d", scale=.75, ci=None)

# Improve the legend 
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[3:], labels[3:], title="species",
          handletextpad=0, columnspacing=1,
          loc="lower right", ncol=3, frameon=True)

在这里插入图片描述

scatterplot

散点图

"""
Scatterplot with categorical and numerical semantics
====================================================
"""
sns.set(style="whitegrid")

# Load the example iris dataset
diamonds = pd.read_csv("./seaborn-data-master/diamonds.csv")

# Draw a scatter plot while assigning point colors and sizes to different
# variables in the dataset
f, ax = plt.subplots(figsize=(6.5, 6.5))
sns.despine(f, left=True, bottom=True)
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]
sns.scatterplot(x="carat", y="price",
                hue="clarity", size="depth",
                palette="ch:r=-.2,d=.3_r",
                hue_order=clarity_ranking,
                sizes=(1, 8), linewidth=0,
                data=diamonds, ax=ax)

<AxesSubplot:xlabel='carat', ylabel='price'>

在这里插入图片描述

boxenplot

增强箱图

"""
Plotting large distributions
============================

"""
sns.set(style="whitegrid")

diamonds = pd.read_csv("./seaborn-data-master/diamonds.csv")
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]

sns.boxenplot(x="clarity", y="carat",
              color="b", order=clarity_ranking,
              scale="linear", data=diamonds)

<AxesSubplot:xlabel='clarity', ylabel='carat'>

在这里插入图片描述

Scatterplot

散点图

"""
Scatterplot with continuous hues and sizes
==========================================

"""

sns.set()

# Load the example iris dataset
planets = pd.read_csv("./seaborn-data-master/planets.csv")

cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
ax = sns.scatterplot(x="distance", y="orbital_period",
                     hue="year", size="mass",
                     palette=cmap, sizes=(10, 200),
                     data=planets)

在这里插入图片描述

"""
Scatterplot with marginal ticks
===============================
"""
sns.set(style="white", color_codes=True)

# Generate a random bivariate dataset
rs = np.random.RandomState(9)
mean = [0, 0]
cov = [(1, 0), (0, 2)]
x, y = rs.multivariate_normal(mean, cov, 100).T

# Use JointGrid directly to draw a custom plot
grid = sns.JointGrid(x, y, space=0, height=6, ratio=50)
grid.plot_joint(plt.scatter, color="g")
grid.plot_marginals(sns.rugplot, height=1, color="g")

在这里插入图片描述

PairGrid

用于绘制数据集中成对关系的子图网格。

"""
Paired density and scatterplot matrix
=====================================

"""

sns.set(style="white")

df = pd.read_csv("./seaborn-data-master/iris.csv")

g = sns.PairGrid(df, diag_sharey=False)
g.map_lower(sns.kdeplot)
g.map_upper(sns.scatterplot)
g.map_diag(sns.kdeplot, lw=3)

在这里插入图片描述

"""
Paired categorical plots
========================

"""
sns.set(style="whitegrid")

# Load the example Titanic dataset
titanic = pd.read_csv("./seaborn-data-master/titanic.csv")

# Set up a grid to plot survival probability against several variables
g = sns.PairGrid(titanic, y_vars="survived",
                 x_vars=["class", "sex", "who", "alone"],
                 height=5, aspect=.5)

# Draw a seaborn pointplot onto each Axes
g.map(sns.pointplot, scale=1.3, errwidth=4, color="xkcd:plum")
g.set(ylim=(0, 1))
sns.despine(fig=g.fig, left=True)

在这里插入图片描述

residplot

线性回归残差图

"""
Plotting model residuals
========================

"""

sns.set(style="whitegrid")

# Make an example dataset with y ~ x
rs = np.random.RandomState(7)
x = rs.normal(2, 1, 75)
y = 2 + 1.5 * x + rs.normal(0, 2, 75)

# Plot the residuals after fitting a linear model
sns.residplot(x, y, lowess=True, color="g")

在这里插入图片描述

"""
Scatterplot with varying point sizes and hues
==============================================

"""
sns.set(style="white")

# Load the example mpg dataset
mpg = pd.read_csv("./seaborn-data-master/mpg.csv")

# Plot miles per gallon against horsepower with other semantics
sns.relplot(x="horsepower", y="mpg", hue="origin", size="weight",
            sizes=(40, 400), alpha=.5, palette="muted",
            height=6, data=mpg)

在这里插入图片描述

swarmplot

能够显示分布密度的分类散点图

"""
Scatterplot with categorical variables
======================================

"""

sns.set(style="whitegrid", palette="muted")

# Load the example iris dataset
iris = pd.read_csv("./seaborn-data-master/iris.csv")

# "Melt" the dataset to "long-form" or "tidy" representation
iris = pd.melt(iris, "species", var_name="measurement")

# Draw a categorical scatterplot to show each observation
sns.swarmplot(x="measurement", y="value", hue="species",
              palette=["r", "c", "y"], data=iris)

在这里插入图片描述

pairplot

变量关系组图

"""
Scatterplot Matrix
==================

"""

sns.set(style="ticks")

df = pd.read_csv("./seaborn-data-master/iris.csv")
sns.pairplot(df, hue="species")

<seaborn.axisgrid.PairGrid at 0x2c7c11763a0>

在这里插入图片描述

clustermap

聚集图

"""
Discovering structure in heatmap data
=====================================

"""

sns.set()

# Load the brain networks example dataset
df = pd.read_csv("./seaborn-data-master/brain_networks.csv", header=[0, 1, 2], index_col=0)

# Select a subset of the networks
used_networks = [1, 5, 6, 7, 8, 12, 13, 17]
used_columns = (df.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
df = df.loc[:, used_columns]

# Create a categorical palette to identify the networks
network_pal = sns.husl_palette(8, s=.45)
network_lut = dict(zip(map(str, used_networks), network_pal))

# Convert the palette to vectors that will be drawn on the side of the matrix
networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks, index=df.columns).map(network_lut)

# Draw the full plot
sns.clustermap(df.corr(), center=0, cmap="vlag",
               row_colors=network_colors, col_colors=network_colors,
               linewidths=.75, figsize=(13, 13))

在这里插入图片描述

到这里就结束了，如果对你有帮助你，欢迎点赞关注，你的点赞对我很重要

在这里插入图片描述

北山啦

关注

11
点赞
踩
36

收藏

觉得还不错? 一键收藏
打赏
1
评论
Python数据可视化--Seaborn绘图总结2

数据可视化-Seaborn简易入门数据可视化利器-Seaborn绘图总结seaborn官网：https://seaborn.pydata.org/Seaborn其实是在matplotlib的基础上进行了更高级的API封装，从而使得作图更加容易。seaborn一共有5个大类21种图，分别是：Relational plots 关系类图表relplot() 关系类图表的接口，其实是下面两种图的集成，通过指定kind参数可以画出下面的两种图scatterplot() 散点图lineplo
复制链接

扫一扫