使用matplotlib绘制箱线图(boxplot)

做机器学习经常需要对数据进行可视化,这次介绍的是boxplot,因为我在数据分析中经常用到它,所以整了一个小函数,有利于我以后使用更加方便

-

boxplot具有以下优点:

1.以图形方式,显示数据的分布情况,一目了然

2.可以看出数据的对称性和偏度

3.可以显示异常值

-

首先需要调用matplotlib:

import matplotlib.pyplot as plt

-

以下是一个简单的制作箱线图的小函数,可以直接使用。

def boxplot_and_savefig(data,labels,colors,xlabel,ylabel,title,return_img_path="",dpi=200,save=False):
    fig,ax=plt.subplots(figsize=(9,4))
    box_plot=ax.boxplot(data,vert=True,patch_artist=True,labels=labels)
    ax.set_title(title)
    for patch,color in zip(box_plot["boxes"],colors):
        patch.set_facecolor(color)
    ax.yaxis.grid(True)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if save:
        plt.savefig(return_img_path,dpi=dpi)
    plt.show()
    plt.close()

data:一个二维数组,或者一个向量序列。

        比如,我需要可视化:7个不同方法,在34个不同数据集中的auc值。

        然后我这个data里,就包含7个array,每个array有34个值。

labels:一个一维数组,里边是每个数据集的标签,所以说,labels和data的长度是一样的。

colors:一个一维数组,里边是不同label对应的颜色,所以,labels和colors的长度也是一致的。

xlabel:一个字符串,是x轴的标签。

ylabel:一个字符串,是y轴的标签。

title:一个字符串,是图片的title。

return_file_path:一个字符串,是输出图片的文件地址,比如"auc_test.png",也可以没有后缀名,此时会默认输出.png格式的图片。

dpi:一个浮点数,是每英寸点数,表示分辨率。

save:一个Boolean,如果save=True,会保存文件,反之,不会保存文件。

-

以下为完整代码:

import matplotlib.pyplot as plt
import pandas as pd

import excel_numpy


def boxplot_and_savefig(data,labels,colors,xlabel,ylabel,title,return_img_path="",dpi=200,save=False):
    fig,ax=plt.subplots(figsize=(9,4))
    box_plot=ax.boxplot(data,vert=True,patch_artist=True,labels=labels)
    ax.set_title(title)
    for patch,color in zip(box_plot["boxes"],colors):
        patch.set_facecolor(color)
    ax.yaxis.grid(True)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if save:
        plt.savefig(return_img_path,dpi=dpi)
    plt.show()
    plt.close()

if __name__ == '__main__':
    file_name="7种方法auc.xlsx"
    df=pd.read_excel(file_name)
    labels=["dt", "svm", "mlp", "adaboost", "bagging", "knn", "rf"]
    xlabel="method"
    ylabel="auc"
    experiment_focused_unfocused=[]
    colors=["#b4a6ca","#1f5681","#ae5e52","#6b9ac8","#66ce63","#c9e2e3","#f8c689"]
    for i in range(0,7):
        experiment_focused_unfocused.append(excel_numpy.excel_col_to_numpy(df,0,i,reduce_nan=True))
        # 这里用到了一个我自己的函数,是整列整列读取excel里的某一列,返回一个numpy数组,可以不用管它,也可以参考以前的博客 
    print(experiment_focused_unfocused)
    # 输出如下:
    # [array([0.94736842, 0.91111111, 0.92857143, 0.87202381, 0.93218623,
    #         0.93336484, 0.9125, 0.97452935, 0.86447368, 0.91891892,
    #         0.84615385, 0.89534884, 0.96428571, 0.8902439, 0.9650974,
    #         0.8902439, 0.8625, 0.825, 0.90789474, 0.875,
    #         0.8846875, 0.9249531, 0.93243243, 0.8875, 0.80263158,
    #         0.85714286, 0.92105263, 0.84625, 0.89772727, 0.855,
    #         0.90697674, 0.89487179, 0.91644132, 0.86386555]),
    #  array([0.56140351, 0.46814815, 0.393134, 0.42440476, 0.64439946,
    #         0.50519849, 0.58141026, 0.55481728, 0.44407895, 0.33747261,
    #         0.43523997, 0.63131027, 0.46400886, 0.42044135, 0.36742424,
    #         0.52012195, 0.47743902, 0.54679487, 0.55735493, 0.31666667,
    #         0.4925, 0.36710444, 0.56006006, 0.425625, 0.44703104,
    #         0.6202381, 0.44905533, 0.413125, 0.41331924, 0.51375,
    #         0.49894292, 0.44519231, 0.45619254, 0.47058824]),
    #  array([0.7928475, 0.67802469, 0.85437431, 0.76190476, 0.73549258,
    #         0.66233459, 0.5275641, 0.64977852, 0.89802632, 0.48648649,
    #         0.58974359, 0.79296653, 0.64507198, 0.63414634, 0.81385281,
    #         0.74817073, 0.73536585, 0.73557692, 0.67948718, 0.72371795,
    #         0.73, 0.67229518, 0.6493994, 0.756875, 0.70748988,
    #         0.71785714, 0.65890688, 0.685, 0.64376321, 0.6040625,
    #         0.68604651, 0.68942308, 0.65143321, 0.66008403]),
    #  array([0.91261808, 0.82592593, 0.87873754, 0.86755952, 0.82523617,
    #         0.85964083, 0.88461538, 0.89977852, 0.84407895, 0.9284149,
    #         0.81722551, 0.81792399, 0.82834994, 0.80052265, 0.8633658,
    #         0.82347561, 0.7554878, 0.81442308, 0.89439946, 0.83397436,
    #         0.8165625, 0.86397749, 0.92454955, 0.865, 0.77968961,
    #         0.87678571, 0.8900135, 0.8184375, 0.81210359, 0.7825,
    #         0.84434461, 0.84102564, 0.80638183, 0.81428571]),
    #  array([0.96626181, 0.93777778, 0.96040975, 0.93571429, 0.92780027,
    #         0.93029301, 0.9224359, 0.9521041, 0.89473684, 0.96018992,
    #         0.91781723, 0.94980147, 0.98228128, 0.93437863, 0.90313853,
    #         0.90823171, 0.90030488, 0.9150641, 0.91902834, 0.92435897,
    #         0.935, 0.9518449, 0.98310811, 0.9240625, 0.88326586,
    #         0.91517857, 0.9402834, 0.9046875, 0.94001057, 0.8884375,
    #         0.93446089, 0.93782051, 0.91860465, 0.87352941]),
    #  array([0.87719298, 0.76790123, 0.83859358, 0.73184524, 0.81106613,
    #         0.81568998, 0.80032051, 0.86655592, 0.79407895, 0.86157779,
    #         0.70940171, 0.73681225, 0.79928018, 0.71835075, 0.78977273,
    #         0.7429878, 0.69817073, 0.76474359, 0.79183536, 0.76923077,
    #         0.7515625, 0.79737336, 0.8033033, 0.75, 0.78947368,
    #         0.82321429, 0.865722, 0.766875, 0.74180761, 0.709375,
    #         0.77140592, 0.76987179, 0.74986479, 0.71428571]),
    #  array([0.99527665, 0.98814815, 0.97425249, 0.9639881, 0.97840756,
    #         0.95344991, 0.97628205, 0.99833887, 0.95328947, 0.96932067,
    #         0.97238659, 0.96454906, 0.98394241, 0.96457607, 0.98998918,
    #         0.9597561, 0.96890244, 0.96314103, 0.94703104, 0.9900641,
    #         0.9609375, 0.97373358, 0.99399399, 0.9840625, 0.96018893,
    #         0.96547619, 0.9645749, 0.9646875, 0.97145877, 0.9390625,
    #         0.99339323, 0.98846154, 0.97133586, 0.94831933])]

    boxplot_and_savefig(experiment_focused_unfocused,labels,colors,xlabel="method",ylabel="auc",title="auc boxplot",return_img_path="auc_test",save=True)


作图结果如下:

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值