【Python数据处理】读取event文件导出的excel数据并作平滑处理（类似tensorboard的smoothing效果）

EdenGabriel

已于 2022-04-06 21:42:02 修改

阅读量2.9k

点赞数 8

分类专栏： # Python数据处理及绘图文章标签： python 数据处理平滑数据读取excel数据

于 2020-06-27 11:37:32 首次发布

本文链接：https://blog.csdn.net/qq_38587510/article/details/106979653

版权

Python数据处理及绘图专栏收录该内容

5 篇文章

订阅专栏

【Python数据处理】批量导出深度学习网络训练生成的event格式文件中的数据到同一excel表的不同sheet

上一篇blog已经把训练生成的event数据导入到excel中去，（笔者生成的event格式文件是训练深度学习网络生成的文件数据）接下来我们就要根据这些数据来绘图。首先要读取excel中的数据。

读取excel数据

这里笔者尝试了两种读取excel数据的方式，推荐使用pandas包，因为该包中有一个Dataframe数据格式，简直好用的不得了，而且操作很灵活。
先上demo，下面再分析：

import pandas as pd

def readExcel(excelName,sheetName,):
    '''
        Use pandas to read excel data
            excelName：name of excel
            sheetName：name of sheet for excel
            return：customize the data to return      
    '''

    if not os.path.exists(excelName):
        print("Sorry,%s not exist.Please check!!"%excelName)
        return FileNotFoundError
        
    dataframe = pd.read_excel(excelName,sheet_name=sheetName)
    print("There are have %d rows, %d columns.(No header)"
            %(len(dataframe.index),len(dataframe.columns)))
    print("Labels of columns:",dataframe.columns.values)
    print("----------Describe of columns----------")
    print(dataframe.describe())
    
    #if pandas.__version__ < 1.0.0 ,loc-->ix
    stepdata = dataframe.loc[:,dataframe.columns.values[0]].values#get all row data of the specified column
    print(type(stepdata))
  
    return stepdata

pandas下有read_excel函数，可以读取指定excel名称的指定sheet的数据。然后我们可通过len(dataframe.index)
len(dataframe.columns)查看读出来的数据由多少行和列，index表示读取行，columns表示读取列，这里的行和列不包括表头！！！
dataframe.columns.values：查看每一列数据的label；dataframe.describe()：按列对数据进行描述分析，结果会返回列数据的总和、均值、标准差、分位数（默认有25%,50%,75%）。
接下来就是获取我们指定的某一列的数据了，这里需要提醒注意的是，如果你的pandas版本大于1.0，dataframe.ix的操作可以用dataframe.loc或dataframe.iloc来代替，这两者的区别在于iloc只能传入整数型数据。dataframe.loc[:,dataframe.columns.values[0]].values表示的是读取该第0列数据的所有行。到此为止，我们已经得到了我们想要的数据。

这里笔者附上几个与pandas相关的网址供大家参考学习
pandas 官方API介绍
pandas–Dataframe基本操作 Blog1
pandas–Dataframe基本操作 Blog2

读取数据的另一种方式：借助openpyxl的load_workbook，这里就不详细解释了，很容易理解。

from openpyxl import load_workbook

workbook = load_workbook("./test.xlsx")
sheets = workbook.get_sheet_names()
booksheet = workbook.get_sheet_by_name(sheets[1])

rows = booksheet.rows
columns = booksheet.columns
print(booksheet.max_row)
print(booksheet.max_column)

step=[]
sum_score=[]

i = 0
# 迭代所有的行
for row in rows:
  i = i + 1
  line = [col.value for col in row]
  cell_data_1 = booksheet.cell(row=i, column=1).value        #获取第i行1 列的数据
  cell_data_2 = booksheet.cell(row=i, column=2).value        #获取第i行 2 列的数据
  step.append(cell_data_1)
  sum_score.append(cell_data_2)
# print (step)

平滑数据

在拿到了我们想要的数据之后，倘若直接绘图的话小伙伴们会发现，得到的图片会是这个样子（该图是笔者用之前训练深度强化学习算法生成的event文件测试得到的结果）。
在这里插入图片描述
so，要让图片变得更加直观美观一点，可以考虑类似tensorboard中的smoothing操作，我们需要把数据平滑处理一下。
tensorboard 平滑数据的源代码：704行
其实就是用到一个公式，类似这个样子

valueSmoothed = last * weight+(1-weight)*i

weight就是平滑的权重系数可以自行设定，i是下一个要平滑的数据，last是上一个平滑后的数据，具体操作可见下面的demo。
先放一张平滑后的效果图，平滑系数是0.999
在这里插入图片描述
smooth data代码：

def smoothData(dataList,weight=0.90):
    last = dataList[0]
    dataSmoothed = []
    for i in dataList:
        valueSmoothed = last * weight+(1-weight)*i
        dataSmoothed.append(valueSmoothed)
        last = valueSmoothed
    return dataSmoothed