最近帮朋友搞了下毕设的数据处理的问题。核心功能点包括
- 离散序列极值处理
- Excel数据处理
Python处理离散序列极值问题采用了scipy.signal提供了峰值处理函数argrelextrema。
- 极大值:x[signal.argrelextrema(x, np.greater_equal)]
- 极小值:x[signal.argrelextrema(x, np.less_equal)]
由于**_equal对于连续的极值点会计算多次,因此,需要对结果中的连续相等数值进行过滤,采用了itertools.groupbygroupby()函数,该API会把迭代器中相邻的,指定特征重复的元素挑出来放在一起list中。
核心代码如下:
def calData(fileName):
x = np.array(readExeclData(fileName))
greatListResOri = []
# 计算极大值
greatList = x[signal.argrelextrema(x, np.greater_equal)]
for num in greatList:
res = num
if str(num)[-1] == ".":
res = int(str(num)[:-1])
greatListResOri.append(res)
lessListResOri = []
# 计算极小值
lessList = x[signal.argrelextrema(x, np.less_equal)]
for num in lessList:
res = num
if str(num)[-1] == ".":
res = int(str(num)[:-1])
lessListResOri.append(res)
# 去重
greatListRes = [k for k, g in itertools.groupby(greatListResOri)]
lessListRes = [k for k, g in itertools.groupby(lessListResOri)]
对于Excel数据处理问题,首先需要读取数据(本例中数据都在第一列),对于多个文件可以进行批量的文件名称处理。
## 读取Excel数据
def readExeclData(filePath):
data = xlrd.open_workbook(filePath, encoding_override='utf-8')
dataTabel = data.sheets()[0] # 第一列
rows = dataTabel.nrows
cols = dataTabel.ncols
result = []
for index in range(0, rows):
numData = dataTabel.row_values(index)
result.append(numData[0])
# 去重使用,不去重直接返回result
return [k for k, g in itertools.groupby(result)]
# 获取指定目录下的所有excel文件名
def getFileName(path):
f_list = os.listdir(path)
fileList = []
for i in f_list:
if os.path.splitext(i)[1] == '.xlsx' or
os.path.splitext(i)[1] == '.xls':
fileList.append("DataFile/" + i)
return fileList
对于excel的读取和写入,用到了xlrd、xlwt、openpyxl等三方库。
# 根据文件名打开指定的excel
old_workbook = xlrd.open_workbook(fileName)
new_workbook = copy(old_workbook)
# 创建新的sheet
sheet = new_workbook.add_sheet('极大极小值-NEW')
# 在第0行第0列写入内容——极大值
sheet.write(0, 0, "极大值")
j = 1
for i in greatListRes:
sheet.write(j, 0, i)
j = j + 1
sheet.write(0, 1, "极小值")
j = 1
for i in lessListRes:
sheet.write(j, 1, i)
j = j + 1
# 将新写入的数据保存
new_workbook.save(fileName)
下面代码的需求是按照日期对数据做行列变换:
def dealExcelData(fileName):
workBook = xlrd.open_workbook(fileName)
currentSheet = workBook.sheet_by_index(0)
# 获取日期列表
dateList = []
for i in range (1, currentSheet.nrows):
if i % 24 == 0:
dateList.append(currentSheet.cell_value(i, 0)[5:10].replace("-", "/"))
new_workbook = copy(workBook)
sheetList = []
for i in range (1, currentSheet.ncols):
sheetList.append(currentSheet.cell_value(0, i)[:-3])
index = 1 # 列数
for sheet in sheetList:
newsheet = new_workbook.add_sheet(sheet)
j = 1 # 列数
for dateData in dateList:
print(dateData)
newsheet.write(0, j, dateData)
j += 1
for i in range(1, 25):
newsheet.write(i, 0, i)
row = 1
col = 1
for i in range(1, currentSheet.nrows):
newsheet.write(row, col, currentSheet.cell_value(i, index))
row += 1
if i % 24 == 0:
col += 1
row = 1
index += 1
new_workbook.save(fileName)
下面的数据把excel所有sheet中包含δ一列(可能有很多列)的所有数据移动到整个sheet最大行的尾部依次排列开来:
def moveData(fileName):
workBook = openpyxl.open(fileName)
for sheet in workBook.worksheets:
writeCol = 1
maxRow = sheet.max_row + 3
for currentCol in range(1, sheet.max_column):
if sheet.cell(1, currentCol).value == 'δ':
columndata = []
i = 1
while True:
cellvalue = sheet.cell(i, currentCol).value
if cellvalue is None:
break
columndata.append(cellvalue)
i += 1
lastRow = maxRow
# 向末尾写入
for val in columndata:
sheet.cell(lastRow, writeCol, val)
lastRow += 1
writeCol += 1
workBook.save(fileName)