Seaborn：Python

最新推荐文章于 2024-06-08 14:27:35 发布

weixin_26713521

最新推荐文章于 2024-06-08 14:27:35 发布

阅读量239

点赞数

原文链接：https://towardsdatascience.com/seaborn-python-8563c3d0ad41

版权

Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python. Visualization is the central part of Seaborn which helps in exploration and understanding of data.

Seaborn是建立在matplotlib之上的数据可视化库，并与Python中的pandas数据结构紧密集成。可视化是Seaborn的核心部分，有助于探索和理解数据。

One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.

必须熟悉Numpy和 Matplotlib和Pandas了解Seaborn。

Seaborn offers the following functionalities:

Seaborn提供以下功能：

Dataset oriented API to determine the relationship between variables.
面向数据集的API确定变量之间的关系。
Automatic estimation and plotting of linear regression plots.
自动估计和绘制线性回归图。
It supports high-level abstractions for multi-plot grids.
它支持多图网格的高级抽象。
Visualizing univariate and bivariate distribution.
可视化单变量和双变量分布。

These are only some of the functionalities offered by Seaborn, there are many more of them, and we can explore all of them here.

这些只是Seaborn提供的功能中的一部分，还有更多功能，我们可以在这里进行探索。

To initialize the Seaborn library, the command used is:

要初始化Seaborn库，使用的命令是：

import seaborn as sns

Using Seaborn we can plot wide varieties of plots like:

使用Seaborn，我们可以绘制各种各样的地块，例如：

Distribution Plots
分布图
Pie Chart & Bar Chart
饼图和条形图
Scatter Plots
散点图
Pair Plots
对图
Heat maps
热图

For this entirety of the article, we are using the dataset of Google Playstore downloaded from Kaggle.

在本文的全文中，我们使用从Kaggle下载的Google Playstore数据集。

1.分布图 (1. Distribution Plots)

We can compare the distribution plot in Seaborn to histograms in Matplotlib. They both offer pretty similar functionalities. Instead of frequency plots in the histogram, here we’ll plot an approximate probability density across the y-axis.

我们可以将Seaborn中的分布图与Matplotlib中的直方图进行比较。它们都提供了非常相似的功能。代替直方图中的频率图，这里我们将在y轴上绘制近似的概率密度。

We will be using sns.distplot() in the code to plot distribution graphs.

我们将在代码中使用sns.distplot()绘制分布图。

Before going further, first, let’s access our dataset,

首先，让我们先访问数据集，

Accessing Dataset from our system

从我们的系统访问数据集

The dataset looks like this,

数据集看起来像这样，

Image for post — Google Play Store Dataset from Kaggle

Now, let’s see how distribution plot looks like if we plot for ‘Rating’ column from the above dataset,

现在，让我们看看如果从上述数据集中为“评级”列作图，分布图将是什么样子，

Code for Rating column distribution plot

评级列分布图代码

The Distribution Plot looks like this for Rating’s column,

“评分”列的“分布图”如下所示：

Here, the curve(KDE) that appears drawn over the distribution graph is the approximate probability density curve.

在此，分布图上绘制的曲线( KDE )是近似概率密度曲线。

Similar to the histograms in the matplotlib, in distribution too, we can change the number of bins and make the graph more understandable.

与matplotlib中的直方图类似，在分布上，我们也可以更改bin的数量并使图更易于理解。

We just have to add the number of bins in the code,

我们只需要在代码中添加垃圾箱的数量，

#Change the number of bins
sns.distplot(inp1.Rating, bins=20, kde = False)
plt.show()

Now, the graph looks like this,

现在，图看起来像这样，

In the above graph, there is no probability density curve. To remove the curve, we just have to write ‘kde = False’ in the code.

上图中没有概率密度曲线。要删除曲线，我们只需要在代码中编写“ kde = False”即可 。

We can also provide the title and color of the bins similar to matplotlib to the distribution plots. Let’s see the code for that,

我们还可以向分布图提供类似于matplotlib的垃圾箱的标题和颜色。让我们看一下代码

The distribution graph, for the same column rating, looks like this:

对于相同的列等级，分布图如下所示：

Styling the Seaborn graphs

样式化Seaborn图

One of the biggest advantages of using Seaborn is, it offers a wide range of default styling options to our graphs.

使用Seaborn的最大优势之一是，它为我们的图形提供了多种默认样式选项。

These are the default styles offered by Seaborn.

这些是Seaborn提供的默认样式。

'Solarize_Light2',
 '_classic_test_patch',
 'bmh',
 'classic',
 'dark_background',
 'fast',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn',
 'seaborn-bright',
 'seaborn-colorblind',
 'seaborn-dark',
 'seaborn-dark-palette',
 'seaborn-darkgrid',
 'seaborn-deep',
 'seaborn-muted',
 'seaborn-notebook',
 'seaborn-paper',
 'seaborn-pastel',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-ticks',
 'seaborn-white',
 'seaborn-whitegrid',
 'tableau-colorblind10'

We just have to write one line of code to incorporate these styles into our graph.

我们只需要编写一行代码即可将这些样式合并到我们的图形中。

After applying the dark background to our graph, the distribution plot looks like this,

将深色背景应用于图表后，分布图如下所示，

2.饼图和条形图 (2. Pie Chart & Bar Chart)

Pie Chart is generally used to analyze the data on how a numeric variable changes across different categories.

饼图通常用于分析有关数字变量如何在不同类别中变化的数据。

In the dataset we are using, we’ll analyze how the top 4 categories in the Content Rating column is performing.

在我们使用的数据集中，我们将分析“内容分级”列中排名前4位的类别的效果。

First, we’ll do some data cleaning/mining to the Content rating column and check what are the categories in there.

首先，我们将对“内容分级”列进行一些数据清理/挖掘，并检查其中的类别。

Now, the categories list will be,

现在，类别列表将是

As per the above output, since the count of “Adults only 18+” and “Unrated” are significantly less compared to the others, we’ll drop those categories from the Content Rating and update the dataset.

根据上面的输出，由于“仅18岁以上成人”和“未分级”的计数与其他数据相比要少得多，因此我们将从“内容分级”中删除这些类别并更新数据集。

The categories present in the “Content Rating” column after updating the sheet are,

更新工作表后，“内容分级”列中显示的类别为：

Now, let’s plot Pie Chart for the categories present in the Content Rating column.

现在，让我们为“内容分级”列中存在的类别绘制饼图。

The Pie Chart for the above code looks like the following,

上面代码的饼图如下所示，

From the above Pie diagram, we cannot correctly infer whether “Everyone 10+” and “Mature 17+”. It is very difficult to assess the difference between those two categories when their values are somewhat similar to each other.

从上面的饼图中，我们无法正确推断“所有人10+”和“成熟17+”。当它们的值彼此相似时，很难评估这两个类别之间的差异。

We can overcome this situation by plotting the above data in Bar chart.

我们可以通过在条形图中绘制以上数据来克服这种情况。

Now, the bar Chart looks like the following,

现在，条形图如下所示，

Similar to Pie Chart, we can customize our Bar Graph too, with different Colors of Bars, the title of the chart, etc.

与饼图类似，我们也可以自定义条形图，使用不同的条形颜色，图表标题等。

3.散点图 (3. Scatter Plots)

Up until now, we have been dealing with only a single numeric column from the dataset, like Rating, Reviews or Size, etc. But, what if we have to infer a relationship between two numeric columns, say “Rating and Size” or “Rating and Reviews”.

到目前为止，我们仅处理数据集中的单个数字列，例如“评分”，“评论”或“大小”等。但是，如果我们必须推断两个数字列之间的关系，例如“评分和大小”或“评分和评论”。

Scatter Plot is used when we want to plot the relationship between any two numeric columns from a dataset. These plots are the most powerful visualization tools that are being used in the field of machine learning.

当我们要绘制数据集中任意两个数字列之间的关系时，使用散点图。这些图是机器学习领域中使用的最强大的可视化工具。

Let’s see how the scatter plot looks like for two numeric columns in the dataset “Rating” & “Size”. First, we’ll plot the graph using matplotlib after that we’ll see how it looks like in seaborn.

让我们来看一下数据集“ Rating”和“ Size”中两个数字列的散点图。首先，我们将使用matplotlib绘制图形，之后我们将看到它在seaborn中的外观。

Scatter Plot using matplotlib

使用matplotlib的散点图

#import all the necessary libraries
#Plotting the scatter plotplt.scatter(pstore.Size, pstore.Rating)
plt.show()

Now, the plot looks like this

现在，情节看起来像这样

Scatter Plot using Seaborn

使用Seaborn的散点图

We will be using sns.joinplot() in the code for scatter plot along with the histogram.

我们将在代码中使用sns.joinplot()和散点图以及直方图。

sns.scatterplot() in the code for only scatter plots.

代码中的sns.scatterplot()仅用于散点图。

The Scatter plot for the above code looks like,

以上代码的散点图如下所示：

The main advantage of using a scatter plot in seaborn is, we’ll get both the scatter plot and the histograms in the graph.

在seaborn中使用散点图的主要优点是，我们将在图中同时获得散点图和直方图。

If we want to see only the scatter plot instead of “jointplot” in the code, just change it with “scatterplot”

如果我们希望看到只有散点图，而不是在代码“jointplot”，只是“ 散点 ”更改

Regression Plot

回归图

Regression plots create a regression line between 2 numerical parameters in the jointplot(scatterplot) and help to visualize their linear relationships.

回归图可在jointplot(scatterplot)中的2个数字参数之间创建回归线，并有助于可视化它们的线性关系。

The graph looks like the following,

该图如下所示，

From the above graph, we can infer that there is a steady increase in the Rating if the Price of the apps increases.

从上图可以看出，如果应用程序的价格提高，则评级会稳定增长。

4.配对图 (4. Pair Plots)

Pair Plots are used when we want to see the relationship pattern among more than 3 different numeric variables. For example, let’s say we want to see how a company’s sales are affected by three different factors, in that case, pair plots will be very helpful.

当我们想查看三个以上不同数值变量之间的关系模式时，使用对图。例如，假设我们想了解公司的销售受到三个不同因素的影响，在这种情况下，配对图将非常有用。

Let’s create a pair plot for Reviews, Size, Price, and Rating columns from of dataset.

让我们为数据集中的评论，尺寸，价格和评分列创建一个配对图。

We will be using sns.pairplot() in the code to plot multiple scatter plots at a time.

我们将在代码中使用sns.pairplot()一次绘制多个散点图。

The output graph for the above graphs looks like this,

以上图表的输出图表如下所示：

For the non-diagonal views, the graph will be a scatter plot between 2 numeric variables
对于非对角线视图，图形将是2个数字变量之间的散点图
For the diagonal views, it plots a histogram since both the axis(x,y) is the same.
对于对角线视图，由于两个轴(x，y)相同，因此它绘制了直方图 。

5.热图 (5. Heatmaps)

The heatmap represents the data in a 2-dimensional form. The ultimate goal of the heatmap is to show the summary of information in a colored graph. It utilizes the concept of using colors and color intensities to visualize a range of values.

热图以二维形式表示数据。热图的最终目标是在彩色图表中显示信息摘要。它利用使用颜色和颜色强度的概念来可视化一系列值。

Most of us would have seen the following type of graphics in a football match,

我们大多数人会在足球比赛中看到以下类型的图形，

Heatmaps in Seaborn create exactly these types of graphs.

Seaborn中的热图正是创建了这些类型的图。

We’ll be using sns.heatmap() to plot the visualization.

我们将使用sns.heatmap()绘制可视化效果。

When you have data as the following we can create a heatmap.

当您具有以下数据时，我们可以创建一个热图。

The above table is created using the Pivot table from Pandas. You can see how Pivot tables are created in my previous article Pandas.

上表是使用Pandas的数据透视表创建的。您可以在上一篇文章Pandas中看到如何创建数据透视表。

Now, let’s see how we can create a heatmap for the above table.

现在，让我们看看如何为上表创建一个热图。

In the above code, we have saved the data in the new variable “heat.”

在上面的代码中，我们已将数据保存在新变量“ heat”中。

The heatmap looks like the following,

该热图如下所示，

We can apply some customization to the above graph, and also can change the color gradient so that the highest value will be darker in color and the lowest value will be lighter.

我们可以对上面的图形进行一些自定义，还可以更改颜色渐变，以使最高值的颜色更深，而最低值的颜色更浅。

The updated code will be something like this,

更新后的代码将是这样，

The heatmap for the above-updated code looks like this,

上面更新的代码的热图看起来像这样，

If we observe, in the code we have given “annot = True”, what this means is, when annot is true, each cell in the graph displays its value. If we haven’t mention annot in our code, then the default value it takes is False.

如果我们观察到，在代码中给定了“ annot = True ”，这意味着，当annot为true时 ，图中的每个单元格都会显示其值。如果我们在代码中未提及annot ，则其默认值为False。

Seaborn also supports some of the other types of graphs like Line Plots, Bar Graphs, Stacked bar charts, etc. But, they don’t offer anything different from the ones created through matplotlib.

Seaborn还支持其他一些类型的图形，例如折线图，条形图，堆积条形图等。但是，它们提供的功能与通过matplotlib创建的功能不同。

结论 (Conclusion)

So, this is how Seaborn works in Python and the different types of graphs we can create using seaborn. As I have already mentioned, Seaborn is built on top of the matplotlib library. So, if we are already familiar with the Matplotlib and its functions, we can easily build Seaborn graphs and can explore more depth concepts.

因此，这就是Seaborn在Python中的工作方式以及我们可以使用seaborn创建的不同类型的图。正如我已经提到的，Seaborn建立在matplotlib库的顶部。因此，如果我们已经熟悉Matplotlib及其功能，则可以轻松构建Seaborn图并可以探索更多深度概念。

Thank you for reading and Happy Coding!!!

感谢您的阅读和快乐编码！！！

在这里查看我以前有关Python的文章 (Check out my previous articles about Python here)

翻译自: https://towardsdatascience.com/seaborn-python-8563c3d0ad41

weixin_26713521

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Seaborn：Python

Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python. Visualization is the central part of Seaborn which helps in exploration...
复制链接

扫一扫