使用matplotlib绘制箱线图（boxplot）

bxttttt

已于 2024-04-22 19:02:23 修改

阅读量1k

点赞数

分类专栏：可视化文章标签： matplotlib 学习 python 机器学习人工智能

于 2023-10-10 17:42:12 首次发布

本文链接：https://blog.csdn.net/weixin_61728385/article/details/133749872

版权

可视化专栏收录该内容

2 篇文章 0 订阅

订阅专栏

做机器学习经常需要对数据进行可视化，这次介绍的是boxplot，因为我在数据分析中经常用到它，所以整了一个小函数，有利于我以后使用更加方便

boxplot具有以下优点：

1.以图形方式，显示数据的分布情况，一目了然

2.可以看出数据的对称性和偏度

3.可以显示异常值

首先需要调用matplotlib：

import matplotlib.pyplot as plt

以下是一个简单的制作箱线图的小函数，可以直接使用。

def boxplot_and_savefig(data,labels,colors,xlabel,ylabel,title,return_img_path="",dpi=200,save=False):
    fig,ax=plt.subplots(figsize=(9,4))
    box_plot=ax.boxplot(data,vert=True,patch_artist=True,labels=labels)
    ax.set_title(title)
    for patch,color in zip(box_plot["boxes"],colors):
        patch.set_facecolor(color)
    ax.yaxis.grid(True)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if save:
        plt.savefig(return_img_path,dpi=dpi)
    plt.show()
    plt.close()

data：一个二维数组，或者一个向量序列。

比如，我需要可视化：7个不同方法，在34个不同数据集中的auc值。

然后我这个data里，就包含7个array，每个array有34个值。

labels：一个一维数组，里边是每个数据集的标签，所以说，labels和data的长度是一样的。

colors：一个一维数组，里边是不同label对应的颜色，所以，labels和colors的长度也是一致的。

xlabel：一个字符串，是x轴的标签。

ylabel：一个字符串，是y轴的标签。

title：一个字符串，是图片的title。

return_file_path：一个字符串，是输出图片的文件地址，比如"auc_test.png"，也可以没有后缀名，此时会默认输出.png格式的图片。

dpi：一个浮点数，是每英寸点数，表示分辨率。

save：一个Boolean，如果save=True，会保存文件，反之，不会保存文件。

以下为完整代码：

import matplotlib.pyplot as plt
import pandas as pd

import excel_numpy


def boxplot_and_savefig(data,labels,colors,xlabel,ylabel,title,return_img_path="",dpi=200,save=False):
    fig,ax=plt.subplots(figsize=(9,4))
    box_plot=ax.boxplot(data,vert=True,patch_artist=True,labels=labels)
    ax.set_title(title)
    for patch,color in zip(box_plot["boxes"],colors):
        patch.set_facecolor(color)
    ax.yaxis.grid(True)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if save:
        plt.savefig(return_img_path,dpi=dpi)
    plt.show()
    plt.close()

if __name__ == '__main__':
    file_name="7种方法auc.xlsx"
    df=pd.read_excel(file_name)
    labels=["dt", "svm", "mlp", "adaboost", "bagging", "knn", "rf"]
    xlabel="method"
    ylabel="auc"
    experiment_focused_unfocused=[]
    colors=["#b4a6ca","#1f5681","#ae5e52","#6b9ac8","#66ce63","#c9e2e3","#f8c689"]
    for i in range(0,7):
        experiment_focused_unfocused.append(excel_numpy.excel_col_to_numpy(df,0,i,reduce_nan=True))
        # 这里用到了一个我自己的函数，是整列整列读取excel里的某一列，返回一个numpy数组，可以不用管它，也可以参考以前的博客 
    print(experiment_focused_unfocused)
    # 输出如下：
    # [array([0.94736842, 0.91111111, 0.92857143, 0.87202381, 0.93218623,
    #         0.93336484, 0.9125, 0.97452935, 0.86447368, 0.91891892,
    #         0.84615385, 0.89534884, 0.96428571, 0.8902439, 0.9650974,
    #         0.8902439, 0.8625, 0.825, 0.90789474, 0.875,
    #         0.8846875, 0.9249531, 0.93243243, 0.8875, 0.80263158,
    #         0.85714286, 0.92105263, 0.84625, 0.89772727, 0.855,
    #         0.90697674, 0.89487179, 0.91644132, 0.86386555]),
    #  array([0.56140351, 0.46814815, 0.393134, 0.42440476, 0.64439946,
    #         0.50519849, 0.58141026, 0.55481728, 0.44407895, 0.33747261,
    #         0.43523997, 0.63131027, 0.46400886, 0.42044135, 0.36742424,
    #         0.52012195, 0.47743902, 0.54679487, 0.55735493, 0.31666667,
    #         0.4925, 0.36710444, 0.56006006, 0.425625, 0.44703104,
    #         0.6202381, 0.44905533, 0.413125, 0.41331924, 0.51375,
    #         0.49894292, 0.44519231, 0.45619254, 0.47058824]),
    #  array([0.7928475, 0.67802469, 0.85437431, 0.76190476, 0.73549258,
    #         0.66233459, 0.5275641, 0.64977852, 0.89802632, 0.48648649,
    #         0.58974359, 0.79296653, 0.64507198, 0.63414634, 0.81385281,
    #         0.74817073, 0.73536585, 0.73557692, 0.67948718, 0.72371795,
    #         0.73, 0.67229518, 0.6493994, 0.756875, 0.70748988,
    #         0.71785714, 0.65890688, 0.685, 0.64376321, 0.6040625,
    #         0.68604651, 0.68942308, 0.65143321, 0.66008403]),
    #  array([0.91261808, 0.82592593, 0.87873754, 0.86755952, 0.82523617,
    #         0.85964083, 0.88461538, 0.89977852, 0.84407895, 0.9284149,
    #         0.81722551, 0.81792399, 0.82834994, 0.80052265, 0.8633658,
    #         0.82347561, 0.7554878, 0.81442308, 0.89439946, 0.83397436,
    #         0.8165625, 0.86397749, 0.92454955, 0.865, 0.77968961,
    #         0.87678571, 0.8900135, 0.8184375, 0.81210359, 0.7825,
    #         0.84434461, 0.84102564, 0.80638183, 0.81428571]),
    #  array([0.96626181, 0.93777778, 0.96040975, 0.93571429, 0.92780027,
    #         0.93029301, 0.9224359, 0.9521041, 0.89473684, 0.96018992,
    #         0.91781723, 0.94980147, 0.98228128, 0.93437863, 0.90313853,
    #         0.90823171, 0.90030488, 0.9150641, 0.91902834, 0.92435897,
    #         0.935, 0.9518449, 0.98310811, 0.9240625, 0.88326586,
    #         0.91517857, 0.9402834, 0.9046875, 0.94001057, 0.8884375,
    #         0.93446089, 0.93782051, 0.91860465, 0.87352941]),
    #  array([0.87719298, 0.76790123, 0.83859358, 0.73184524, 0.81106613,
    #         0.81568998, 0.80032051, 0.86655592, 0.79407895, 0.86157779,
    #         0.70940171, 0.73681225, 0.79928018, 0.71835075, 0.78977273,
    #         0.7429878, 0.69817073, 0.76474359, 0.79183536, 0.76923077,
    #         0.7515625, 0.79737336, 0.8033033, 0.75, 0.78947368,
    #         0.82321429, 0.865722, 0.766875, 0.74180761, 0.709375,
    #         0.77140592, 0.76987179, 0.74986479, 0.71428571]),
    #  array([0.99527665, 0.98814815, 0.97425249, 0.9639881, 0.97840756,
    #         0.95344991, 0.97628205, 0.99833887, 0.95328947, 0.96932067,
    #         0.97238659, 0.96454906, 0.98394241, 0.96457607, 0.98998918,
    #         0.9597561, 0.96890244, 0.96314103, 0.94703104, 0.9900641,
    #         0.9609375, 0.97373358, 0.99399399, 0.9840625, 0.96018893,
    #         0.96547619, 0.9645749, 0.9646875, 0.97145877, 0.9390625,
    #         0.99339323, 0.98846154, 0.97133586, 0.94831933])]

    boxplot_and_savefig(experiment_focused_unfocused,labels,colors,xlabel="method",ylabel="auc",title="auc boxplot",return_img_path="auc_test",save=True)

作图结果如下：

bxttttt

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
使用matplotlib绘制箱线图（boxplot）

return_file_path：一个字符串，是输出图片的文件地址，比如"auc_test.png"，也可以没有后缀名，此时会默认输出.png格式的图片。做机器学习经常需要对数据进行可视化，这次介绍的是boxplot，因为我在数据分析中经常用到它，所以整了一个小函数，有利于我以后使用更加方便。colors：一个一维数组，里边是不同label对应的颜色，所以，labels和colors的长度也是一致的。labels：一个一维数组，里边是每个数据集的标签，所以说，labels和data的长度是一样的。
复制链接

扫一扫

专栏目录