Python--Pandas-数据可视化

最新推荐文章于 2024-08-06 15:13:12 发布

Gingkens

最新推荐文章于 2024-08-06 15:13:12 发布

阅读量2.5w

点赞数 31

分类专栏： Python 文章标签： Python 数据可视化 Pandas

本文链接：https://blog.csdn.net/qq_34859482/article/details/80592741

版权

Python 专栏收录该内容

10 篇文章 82 订阅

订阅专栏

1.Pandas 简介

我们做数据可视化，其实就是就数据进行分析，使用Python做数据分析的，我想pandas必然是一个利器，一个非常强大的数据分析工具包，也集成了数据可视化的功能，一个集数据处理、分析、可视化于一身的工具，非常强大好用。pandas中的数据可视化已经可以满足我们大部分的要求了，也就省下了我们很多自己使用如 matplotlib 来数据可视化的工作。

通常使用 pandas 进行下列的图形的快速绘图：

‘line’
‘bar’ or ‘barh’ for bar plots
‘hist’ for histogram
‘box’ for boxplot
‘area’ for area plots
‘scatter’ for scatter plots
‘pie’ for pie plots

对于本身熟悉 Matplotlib 的，就马上使用了，要不是不熟悉，也可通过下面的介绍快速上手。

2.数据集

在对数据进行可视化之前，我们先看一下数据集。
这里写图片描述

3.可视化

3.1 画线

使用如下代码，绘画直线。

df.loc['Algeria'].plot(kind='line', label='Algeria') #取出 Algeria 这一行的数据
plt.legend(loc='upper left')

这里写图片描述
一行代码，就对 index 为 ‘Algeria’ 数据画了线，这里我排除掉了 Total 这一列。

好像也不怎么样？看下面的代码及结果先：

df.T[['Albania', 'Algeria', 'Argentina']].plot(kind='line')

这里写图片描述
很轻易的，我们就画了对应三条线，并且图例说明已经自动生成，已经看出它比 matplotlib 的方便的地方。这里有一个问题需要注意的是，我对数据集进行了转置，也就是行列互换。新的数据集如下：

在对具有多行的数据进行绘图时，pandas会将index作为X轴，对应列的数据作为Y轴，而对应列的Column则作为Line。意思就是给每个列画一条线。所以我们需要作出转置的操作。

3.2 直方图

观察下面代码：

count, bin_edges = np.histogram(df_can['2013']) 
#将数据分成等间距的10个区间，count表示对应区间内有多个数据，bin_edges则是划分的区间，结果如下图

这里写图片描述

df_can['2013'].plot(kind='hist', figsize=(8, 5), xticks=bin_edges)

plt.title('Histogram of Immigration from 195 countries in 2013') # add a title to the histogram
plt.ylabel('Number of Countries') # add y-label
plt.xlabel('Number of Immigrants') # add x-label

plt.show()

pandas 中绘画直方图，也是默认分成10个区间，跟np.histogram的划分一致，所以我们并不需要传入什么数据，只需确定绘画直方图，对于 xticks 可传可不传，若是没有传入xticks，则绘出的直方图的 X轴的区间则不那么直观。
这里写图片描述

像绘画条一样，我们也可以绘出若干个直方图。同样我们需要对数据进行转置。

df_t = df_can.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()
df_t.head()

前5条数据如下：
这里写图片描述

df_t.plot(kind='hist', figsize=(10, 6))

plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.xlabel('Number of Immigrants')

plt.show()

这里写图片描述
可以发现的确是画出了三类直方图。可是有些类别的数据似乎覆盖掉了，我们观察不到，这不是好的例子，所以我们对Plot传入一些参数，使得被覆盖的数据还是可视的。

count, bin_edges = np.histogram(df_t, 15)

# un-stacked histogram
df_t.plot(kind ='hist', 
          figsize=(10, 6),
          bins=15,
          alpha=0.6,
          xticks=bin_edges,
          color=['coral', 'darkslateblue', 'mediumseagreen']
         )

plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.xlabel('Number of Immigrants')

plt.show()

上面我们修改了区间的个数，并且主要的是我们传入 alpha 透明度的参数，这就使得被覆盖的数据可视了。
这里写图片描述
还有下面一种修改

df_t.plot(kind ='hist', 
          figsize=(10, 6),
          bins=15,
          stacked=True,
          xticks=bin_edges,
          color=['coral', 'darkslateblue', 'mediumseagreen']
         )

这里写图片描述

3.3条形图

首先看一下我们将要进行可视化的数据

df_iceland = df_can.loc['Iceland', years]
df_iceland.head()

数据是冰岛1980-2013的移民数据，下面只展示前5条数据。
这里写图片描述
绘画条线图很简答，代码如下：

# step 2: plot data
df_iceland.plot(kind='bar', figsize=(10, 6))
df_iceland.plot(kind='line')
plt.xlabel('Year') # add to x-label to the plot
plt.ylabel('Number of immigrants') # add y-label to the plot
plt.title('Icelandic immigrants to Canada from 1980 to 2013') # add title to the plot

plt.show()

条形图有分垂直，以及水平的，上面的就是垂直的。
这里写图片描述

df_iceland.plot(kind='barh', figsize=(10, 6))

只需要将 kind = ‘bar’ 换成 kind = ‘barh’ 就可以了。
这里写图片描述

3.4 饼图

首先观察一下将要来绘画饼图的数据;
这里写图片描述
通过一下代码绘画饼图：

# autopct create %, start angle represent starting point
df_continents['Total'].plot(kind='pie',
                            figsize=(5, 6),
                            autopct='%1.f%%', # add in percentages
                            startangle=90,     # start angle 90° (Africa)
                            shadow=True,       # add shadow      
                            )

plt.title('Immigration to Canada by Continent [1980 - 2013]')
plt.axis('equal') # Sets the pie chart to look like a circle.

plt.show()

这里写图片描述
可以发现一些label重叠的情况，为解决这种情况我们需要再传入一些参数：

colors_list = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']
explode_list = [0.1, 0, 0, 0, 0.1, 0.1] # ratio for each continent with which to offset each wedge.

df_continents['Total'].plot(kind='pie',
                            figsize=(15, 6),
                            autopct='%1.1f%%', 
                            startangle=90,    
                            shadow=True,       
                            labels=None,         # turn off labels on pie chart
                            pctdistance=1.12,    # the ratio between the center of each pie slice and the start of the text generated by autopct 
                            colors=colors_list,  # add custom colors
                            explode=explode_list # 'explode' lowest 3 continents
                            )

# scale the title up by 12% to match pctdistance
plt.title('Immigration to Canada by Continent [1980 - 2013]', y=1.12) 

plt.axis('equal') 

# add legend
plt.legend(labels=df_continents.index, loc='upper left') 

plt.show()

这里写图片描述

3.5 Area

使用的数据如下：
这里写图片描述

fig, (ax1, ax2) = plt.subplots(2)
df_CI.plot(kind='area', stacked=False, ax=ax1)
df_CI.plot(kind='area', ax=ax2)

ax1 中就像是画 India 和 China 的线，然后进行填充，ax2中 stacked=True，数据的值就会叠加，叠加的方向是 DataFrame 从左向右。
这里写图片描述

3.6 Box 箱型图

数据概览如下：
这里写图片描述

df_CI.plot(kind='box')

这里写图片描述

3.7 Scatter 散点图

部分数据展示：
这里写图片描述

df_tot.plot(kind='scatter', x='year', y='total', figsize=(10, 6), color='darkblue')

plt.title('Total Immigration to Canada from 1980 - 2013')
plt.xlabel('Year')
plt.ylabel('Number of Immigrants')

plt.show()