python sns可视化小技巧（纪录所得）

最新推荐文章于 2024-05-07 07:56:38 发布

只想做打工人

最新推荐文章于 2024-05-07 07:56:38 发布

阅读量6.1k

点赞数 2

分类专栏：数据分析学习文章标签：可视化 python

本文链接：https://blog.csdn.net/weixin_43848469/article/details/112100860

版权

学习同时被 2 个专栏收录

63 篇文章 3 订阅

订阅专栏

数据分析

11 篇文章 0 订阅

订阅专栏

此文是为了纪录自己学习sns过程中的一些使图更好看的小技巧
参考网址：
https://www.kaggle.com/maksymshkliarevskyi/acea-smart-water-eda-prediction
https://www.kaggle.com/subinium/kaggle-2020-visualization-analysis
1.使坐标轴消失：因为sns的图是带有上下轴的，一般去掉右面与上面的坐标轴较多，可以使用这段代码

fig, ax = plt.subplots(figsize = figsize)
ax.spines['top'].set_visible(top_visible)
ax.spines['right'].set_visible(top_visible)

在这里插入图片描述

在这里插入图片描述
上图去掉了坐标轴的明显就好看些了

2.关于字体，这篇文章主要使用的’serif’字体，这个可以按照自己想法来，也给出一个比较

    ax.set_xticklabels( fontfamily = 'serif')
    ax.set_yticklabels(fontfamily = 'serif')

在这里插入图片描述
3.加入网格线，相当于在坐标轴中延长x轴或者y轴，代码如下：

ax.grid(axis='x',linestyle = '-', alpha = 0.9,color='g')

使用效果如下：

在这里插入图片描述

4.更好看的相关矩阵,一般的矩阵看起来是这样的：
在这里插入图片描述
优化的方法有几个方面，将上半矩阵去除，显示数字，改变颜色：

data=read_csv("*.csv")   #读取数据
mask = np.triu(np.ones_like(data.corr(), dtype=bool))  #得到一个bool矩阵，下半部分全为False
sns.heatmap(data.corr(),mask=mask,annot=True,cmap='viridis')   #cmap可自己定义，这个颜色较好看

效果如下
在这里插入图片描述

5.关于时间序列的图，美化可以只保留下部分的坐标轴，另外加入grid标尺线，像下图所示：
在这里插入图片描述
代码如下：

ax =plt.gca()#得到现在的ax
ax.grid(ax='y',linestyle='-',alpha=0.4)

6.关于seabon中的颜色如何选，经常感觉颜色选得好，图画的好看很多，但是由于不知道怎么选择颜色，于是看到一个调色板的参数

关于sns.countplot中platte参数:

palettepalette name, list, or dict
Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.

在python中可以通过通过color_palette()创建调色板
关于color_palette的参数:
https://seaborn.pydata.org/generated/seaborn.color_palette.html
可以这样使用：

color = color_palette("Blues")  #表明蓝色色系
sns.countplot(x=data['Q1'],pl)

在这里插入图片描述
而不同颜色的选择可以看这篇文章：https://zhuanlan.zhihu.com/p/27471537

7.直方图直方图就是用来统计个数，sns.countplot() 和ax.bar都能实现
在直方图中加入文字解释

![在这里插入图片描述](https://img-blog.csdnimg.cn/202101112219169.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80Mzg0ODQ2OQ==,size_16,color_FFFFFF,t_70

def plot_count(data,color,xlabel,ylabel,title):  #画出计数图
    data_q1 = data.value_counts().sort_index()
    ax = plt.gca()
    fig= plt.gcf()
    sns.countplot(x=data,palette=sns.color_palette(color))
    ax.grid(axis='y', linestyle='-', alpha=0.8)
    ax.set_title(title,fontsize=15,fontfamily='serif')
    ax.set_xlabel(xlabel,fontsize=10,fontfamily='serif')
    ax.set_ylabel(ylabel,fontsize=10,fontfamily='serif')
    ax.legend(loc='best')
    #使用annotate进行标注数字
    sns.despine(left=True,bottom=True)   #超级关键
    plt.tight_layout()
    plt.show()

在这里插入图片描述
8.关于图例位置（即legend的方向）
使用ax.legend()函数，而在legend日常使用中主要改变如下参数

ax.legend(loc='lower center', ncol=4, bbox_to_anchor=(0.48, -0.48)) #将图像移至下方且一行4列

在这里插入图片描述

8.画图proporation(就是占多少比例的图) 比如在这个州中占各个能源比例的图像，如果需要找到以另一个指标为排名的步骤，比如需要找到前10国家中的男，女比例分布，也就是需要对国家人口进行排序之后得到index，然后对index使用

9.unstack的用法：
参考链接：https://blog.csdn.net/S_o_l_o_n/article/details/80917211
unstack简单来说就是可以将行索引变为列索引，而stack就是将列索引变为行索引，下面两幅图可以很好理解两个函数的过程
在这里插入图片描述

在这里插入图片描述
而unstack可以和.groupby配合使用，因为对于groupby经常会产生行索引，对这样一段代码：

a=data.groupby(['Q3'])['Q2'].value_counts()

>>>
Q3                        Q2                     
Argentina                 Man                        111
                          Woman                       20
                          Prefer not to say            3
Australia                 Man                        182
                          Woman                       38

而加入unstack之后

a=data.groupby(['Q3'])['Q2'].value_counts().unstack()

>>>
Q2                                                   ETC     Man   Woman
Q3                                                                      
Argentina                                            3.0   111.0    20.0
Australia                                           11.0   182.0    38.0
Bangladesh                                           1.0   118.0    24.0
Belarus                                              1.0    46.0    12.0
Belgium                                              NaN    50.0    10.0
Brazil                                               2.0   599.0    93.0
Canada                                               5.0   225.0    71.0

从index的角度理解，就像是从MultiIndex属性转换为index属性,而单index可以使表格属性更清晰

而算比例的代码也比较简单，从上面的表格中需要算出每个国家男女比例，首先将每个国家的人数算出来形成一列，之后使用一个简单的矩阵除法，代码如下：

a=data[['Q2', 'Q3']].groupby(['Q3'])['Q2'].value_counts().unstack() #求出上述表格
sex_count = a.sum(axis=1)  #得到每个国家的人数
sex_ratio = (a.T/sex_count).T[['Man','Woman','ETC']]        #使用除法计算得到比例
print(sex_ratio)   # 打印比例

>>>
Q2                                                       Man  ...       ETC
Q3                                                            ...          
Argentina                                           0.828358  ...  0.022388
Australia                                           0.787879  ...  0.047619
Bangladesh                                          0.825175  ...  0.006993
Belarus                                             0.779661  ...  0.016949
Belgium                                             0.833333  ...       NaN
Brazil                                              0.863112  ...  0.002882
Canada                                              0.747508  ...  0.016611

画图过程如下，因为代码太长，所以要用的时候在细看

fig, ax = plt.subplots(1,1,figsize=(12, 6),)

ax.barh(data_q2q3_ratio.index, data_q2q3_ratio['Man'], 
        color='#004c70', alpha=0.7, label='Man')
ax.barh(data_q2q3_ratio.index, data_q2q3_ratio['Woman'], left=data_q2q3_ratio['Man'], 
        color='#990000', alpha=0.7, label='Woman')
ax.barh(data_q2q3_ratio.index, data_q2q3_ratio['ETC'], left=data_q2q3_ratio['Man']+data_q2q3_ratio['Woman'], 
        color='#4a4a4a', alpha=0.7, label='ETC')

ax.set_xlim(0, 1)
ax.set_xticks([])
ax.set_yticklabels(data_q2q3_ratio.index, fontfamily='serif', fontsize=11)

male percentage

for i in data_q2q3_ratio.index:
    ax.annotate(f"{data_q2q3_ratio['Man'][i]*100:.3}%", 
                   xy=(data_q2q3_ratio['Man'][i]/2, i),
                   va = 'center', ha='center',fontsize=9, fontweight='light', fontfamily='serif',
                   color='white')

for i in data_q2q3_ratio.index:
    ax.annotate(f"{data_q2q3_ratio['Woman'][i]*100:.3}%", 
                   xy=(data_q2q3_ratio['Man'][i]+data_q2q3_ratio['Woman'][i]/2, i),
                   va = 'center', ha='center',fontsize=9, fontweight='light', fontfamily='serif',
                   color='white')
    

fig.text(0.13, 0.95, 'Top10 Country : Gender Distribution', fontsize=15, fontweight='bold', fontfamily='serif')   
fig.text(0.131, 0.91, 'Percent Stacked Bar Chart', fontsize=12,fontfamily='serif')   

for s in ['top', 'left', 'right', 'bottom']:
    ax.spines[s].set_visible(False)
    
ax.legend(loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.06))
plt.show()

图形效果
在这里插入图片描述

10.画线形图的技巧，对于时间序列的一些图，将线条画粗一点，将图例放在合适位置，取消y轴科学技术法，将x轴旋转45度

代码：

ax=plt.gca()
formatter = mpl.ticker.ScalarFormatter(useMathText=True)
sns.lineplot(data=data_consumption_sum,x='Year',y='Data',hue='StateCode',sizes=[0.1,0.4],legend='full',linewidth = 3)  #也就是选择了
ax.yaxis.set_major_formatter(formatter)
ax.set_xlabel("") #删除Title保持没关
ax.legend(loc='lower center',ncol=4,bbox_to_anchor=(0.5, -0.2),markerscale=3,shadow=True)
ax.grid(axis='y', linestyle='-', alpha=0.8)  #画刻度尺
plt.xticks(rotation=90,size=10) #旋转x轴标签
sns.despine(left=True, bottom=True)  # 超级关键
plt.show()

效果图

在这里插入图片描述

11.像英雄属性一样的图，不知道叫什么，纪录下链接
在这里插入图片描述
https://www.kaggle.com/kgiangnguyen/alice-in-the-data-science-world

12.定义自己的绘图风格，因为经常需要修改一些自己喜欢的字体大小和颜色以及线形图颜色，于是希望找到一个可以直接设置而不需要重复代码的方法，下面是链接，纪录一些自己的习惯

参考链接:https://matplotlib.org/3.3.3/tutorials/introductory/customizing.html

用来改变字体和大小，

import matplotlib.pyplot as plt
plt.rcParams['savefig.dpi'] = 300 #图片像素
plt.rcParams['figure.dpi'] = 300 #分辨率
font = {'family' : 'serif',   #将所有字体改变为粗体 serif类
        'weight' : 'bold',
}
matplotlib.rc('font', **font)
SMALL_SIZE = 8
MEDIUM_SIZE = 10
BIGGER_SIZE = 12

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title
大小

只想做打工人

关注

2
点赞
踩
29

收藏

觉得还不错? 一键收藏
1
评论
python sns可视化小技巧（纪录所得）

此文是为了纪录自己学习sns过程中的一些使图更好看的小技巧参考网址：https://www.kaggle.com/maksymshkliarevskyi/acea-smart-water-eda-prediction1.使坐标轴消失：因为sns的图是带有上下轴的，一般去掉右面与上面的坐标轴较多，可以使用这段代码fig, ax = plt.subplots(figsize = figsize)ax.spines['top'].set_visible(top_visible)ax.spines['ri
复制链接

扫一扫