数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

最新推荐文章于 2024-04-13 18:08:59 发布

Sany 何灿

最新推荐文章于 2024-04-13 18:08:59 发布

阅读量1k

点赞数 2

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/SanyHo/article/details/105171231

版权

数据挖掘专栏收录该内容

32 篇文章 14 订阅

订阅专栏

在这里插入图片描述

Trends

- A trend is defined as a pattern of change.
- sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
Relationship

- There are many different chart types that you can use to understand relationships between variables in your data.
- sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
- sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
- sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
- sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
- sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
- sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
Distribution

- We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
- sns.distplot - Histograms show the distribution of a single numerical variable.
- sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
- sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

1. Line Chart

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib,pyplot as plt
%matplotlib inline
import seaborn as sns

# Path of the file to read
spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)

spotify_data.tail()

	Shape of You	Despacito	Something Just Like This	HUMBLE.	Unforgettable
Date
2018-01-05	4492978	3450315.0	2408365.0	2685857.0	2869783.0
2018-01-06	4416476	3394284.0	2188035.0	2559044.0	2743748.0
2018-01-07	4009104	3020789.0	1908129.0	2350985.0	2441045.0
2018-01-08	4135505	2755266.0	2023251.0	2523265.0	2622693.0
2018-01-09	4168506	2791601.0	2058016.0	2727678.0	2627334.0

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2bb6f98>

在这里插入图片描述

# Set the width and height of the figure
# sets the size of the figure to 14 inches (in width) by 6 inches (in height)
plt.figure(figsize=(14, 6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2a74780>

在这里插入图片描述

Changing styles

# Seaborn has five different themes:(1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks"
# Change the style of the figure to the "dark" theme
sns.set_style("dark")

# Line chart 
plt.figure(figsize=(12,6))
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7f5faa4bc828>

在这里插入图片描述

Plot a subset of the data

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

在这里插入图片描述

2.Bar Charts

# Print the data
flight_data

	AA	AS	B6	DL	EV	F9	HA	MQ	NK	OO	UA	US	VX	WN
Month
1	6.955843	-0.320888	7.347281	-2.043847	8.537497	18.357238	3.512640	18.164974	11.398054	10.889894	6.352729	3.107457	1.420702	3.389466
2	7.530204	-0.782923	18.657673	5.614745	10.417236	27.424179	6.029967	21.301627	16.474466	9.588895	7.260662	7.114455	7.784410	3.501363
3	6.693587	-0.544731	10.741317	2.077965	6.730101	20.074855	3.468383	11.018418	10.039118	3.181693	4.892212	3.330787	5.348207	3.263341
4	4.931778	-3.009003	2.780105	0.083343	4.821253	12.640440	0.011022	5.131228	8.766224	3.223796	4.376092	2.660290	0.995507	2.996399
5	5.173878	-1.716398	-0.709019	0.149333	7.724290	13.007554	0.826426	5.466790	22.397347	4.141162	6.827695	0.681605	7.102021	5.680777
6	8.191017	-0.220621	5.047155	4.419594	13.952793	19.712951	0.882786	9.639323	35.561501	8.338477	16.932663	5.766296	5.779415	10.743462
7	3.870440	0.377408	5.841454	1.204862	6.926421	14.464543	2.001586	3.980289	14.352382	6.790333	10.262551	NaN	7.135773	10.504942
8	3.193907	2.503899	9.280950	0.653114	5.154422	9.175737	7.448029	1.896565	20.519018	5.606689	5.014041	NaN	5.106221	5.532108
9	-1.432732	-1.813800	3.539154	-3.703377	0.851062	0.978460	3.696915	-2.167268	8.000101	1.530896	-1.794265	NaN	0.070998	-1.336260
10	-0.580930	-2.993617	3.676787	-5.011516	2.303760	0.082127	0.467074	-3.735054	6.810736	1.750897	-2.456542	NaN	2.254278	-0.688851
11	0.772630	-1.916516	1.418299	-3.175414	4.415930	11.164527	-2.719894	0.220061	7.543881	4.925548	0.281064	NaN	0.116370	0.995684
12	4.149684	-1.846681	13.839290	2.504595	6.685176	9.346221	-1.706475	0.662486	12.733123	10.947612	7.012079	NaN	13.498720	6.720893

# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

Text(0, 0.5, 'Arrival delay (in minutes)')

在这里插入图片描述

3.Heatmap

# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")

Text(0.5, 42.0, 'Airline')

在这里插入图片描述

4.Scatter Plots

insurance_data.head()

	age	sex	bmi	children	smoker	region	charges
0	19	female	27.900	0	yes	southwest	16884.92400
1	18	male	33.770	1	no	southeast	1725.55230
2	28	male	33.000	3	no	southeast	4449.46200
3	33	male	22.705	0	no	northwest	21984.47061
4	32	male	28.880	0	no	northwest	3866.85520

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f2300048>

在这里插入图片描述

# Add a regression line
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f222c588>

在这里插入图片描述

Color-coded scatter plots

# color-code the points by 'smoker', plot the other two columns('bmi', 'charges') on the axes 
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f19b49e8>

在这里插入图片描述

# add two regression lines, corresponding to smokers and nonsmokers
# Instead of setting x=insurance_data['bmi'] to select the 'bmi' column in insurance_data, we set x="bmi" to specify the name of the column only.
# Similarly, y="charges" and hue="smoker" also contain the names of columns.
# We specify the dataset with data=insurance_data.

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

<seaborn.axisgrid.FacetGrid at 0x7f44f192d668>

在这里插入图片描述

sns.swarmplot(x=insurance_data['smoker'],
							y=insurance_data['charges'])

在这里插入图片描述

5.Histograms

	Sepal Length (cm)	Sepal Width (cm)	Petal Length (cm)	Petal Width (cm)	Species
Id
1	5.1	3.5	1.4	0.2	Iris-setosa
2	4.9	3.0	1.4	0.2	Iris-setosa
3	4.7	3.2	1.3	0.2	Iris-setosa
4	4.6	3.1	1.5	0.2	Iris-setosa
5	5.0	3.6	1.4	0.2	Iris-setosa

# 'a' chooses the columns of the data
# kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.
sns.displot(a=iris_data['Petal Length(cm)'], kde=False)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5b1da20>

在这里插入图片描述

Color-coded plots

# Histograms for each species
sns.distplot(a=iris_set_data['Petal Length (cm)'], label="Iris-setosa", kde=False)
sns.distplot(a=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", kde=False)
sns.distplot(a=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", kde=False)

# Add title
plt.title("Histogram of Petal Lengths, by Species")

# Force legend to appear
plt.legend()

<matplotlib.legend.Legend at 0x7f96c5849470>

在这里插入图片描述

6. Density plots

# Kernel density estimate(KDE) plot is like as a smoothed histogram
# 'shade=True' colors the area below the curve
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5a664e0>

在这里插入图片描述

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

<seaborn.axisgrid.JointGrid at 0x7f96c59cbef0>

The color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
在这里插入图片描述

the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, iris_data['Petal Length (cm)']), and
the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, iris_data['Sepal Width (cm)']).

Color-coded plots

# KDE plots for each species
sns.kdeplot(data=iris_set_data['Petal Length (cm)'], label="Iris-setosa", shade=True)
sns.kdeplot(data=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", shade=True)
sns.kdeplot(data=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")

Text(0.5, 1.0, 'Distribution of Petal Lengths, by Species')

在这里插入图片描述

Sany 何灿

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
数据分析Seaborn常用画图方式汇总|多种方法画图找趋势、找关系、找分布|20 mins速成|Kaggle 学习（一）

Trends- A trend is defined as a pattern of change.sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group....
复制链接

扫一扫

专栏目录