hutool的 ExcelUtil.getReader 超过一分钟,耗时排查
背景,项目中需要对用户上传的excel进行解析,但是表头内容不确定,用户可随意增减
使用hutool工具包进行excel的读取,readAll() 源码中 默认以第一行作为表头,从第二行开始读取数据
ExcelReader reader = ExcelUtil.getReader(new File(filePath));
List<Map<String, Object>> maps = reader.readAll();
public List<Map<String, Object>> readAll() {
return this.read(0, 1, Integer.MAX_VALUE);
}
调用MapSheetReader ## read()方法读取数据,最后会调用IterUtil.toMap(headerList, rowList, true)将数据转换为 key - value 格式
public List<Map<String, Object>> read(Sheet sheet) {
int firstRowNum = sheet.getFirstRowNum();
int lastRowNum = sheet.getLastRowNum();
if (lastRowNum < 0) {
return ListUtil.empty();
} else if (this.headerRowIndex < firstRowNum) {
throw new IndexOutOfBoundsException(StrUtil.format("Header row index {} is lower than first row index {}.", new Object[]{this.headerRowIndex, firstRowNum}));
} else if (this.headerRowIndex > lastRowNum) {
throw new IndexOutOfBoundsException(StrUtil.format("Header row index {} is greater than last row index {}.", new Object[]{this.headerRowIndex, firstRowNum}));
} else {
int startRowIndex = Math.max(this.startRowIndex, firstRowNum);
int endRowIndex = Math.min(this.endRowIndex, lastRowNum);
List<String> headerList = this.aliasHeader(this.readRow(sheet, this.headerRowIndex));
List<Map<String, Object>> result = new ArrayList(endRowIndex - startRowIndex + 1);
for(int i = startRowIndex; i <= endRowIndex; ++i) {
if (i != this.headerRowIndex) {
List<Object> rowList = this.readRow(sheet, i);
if (CollUtil.isNotEmpty(rowList) || !this.ignoreEmptyRow) {
result.add(**IterUtil.toMap(headerList, rowList, true)**);
}
}
}
return result;
}
}
此时问题就出现了,Map 是允许key 和 value 是null的,所以即使表头为空,调用这个 IterUtil.toMap 也不会报错,所以表头中出现大量的列为空字符串时,就出现了耗时问题。
解决方案
获取表头所在行,然后去除后面的空字符串列
private static List<Object> getRows(Sheet next, int rowIndex) {
List<Object> headerList = new ArrayList<>();
Iterator<Cell> cellIterator = next.getRow(rowIndex).cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
String stringCellValue = cell.getStringCellValue();
headerList.add(stringCellValue);
}
return headerList;
}
然后遍历每行记录,调用 IterUtil.toMap(headerList, rowList, true);方法转换
if (headerList.size() == rowLists.size()) {
MapSheetReader mapSheetReader = new MapSheetReader(headerRowIndex, next.getFirstRowNum(), next.getLastRowNum());
read = mapSheetReader.read(next);
} else {
// 处理数据的源码 当headerList 存在表头是空字符串时 , 会引起解析时间非常长
for (int i = next.getFirstRowNum(); i <= next.getLastRowNum(); ++i) {
if (i != headerRowIndex) {
List<Object> rowList = getRows(next, i);
if (CollUtil.isNotEmpty(rowList)) {
Map<String, Object> map = IterUtil.toMap(headerList, rowList, true);
read.add(map);
}
}
}
}