hutool 的excelUtil 读取耗时

要被卷si了

已于 2023-06-29 17:36:11 修改

阅读量1.2k

点赞数 1

文章标签： java

于 2023-06-29 17:34:33 首次发布

本文链接：https://blog.csdn.net/qq_44211776/article/details/131460235

版权

hutool的 ExcelUtil.getReader 超过一分钟，耗时排查

背景，项目中需要对用户上传的excel进行解析，但是表头内容不确定，用户可随意增减

使用hutool工具包进行excel的读取，readAll() 源码中默认以第一行作为表头，从第二行开始读取数据

 	ExcelReader reader = ExcelUtil.getReader(new File(filePath));
 	List<Map<String, Object>> maps = reader.readAll();
 
	public List<Map<String, Object>> readAll() {
        return this.read(0, 1, Integer.MAX_VALUE);
    }

调用MapSheetReader ## read()方法读取数据，最后会调用IterUtil.toMap(headerList, rowList, true)将数据转换为 key - value 格式

public List<Map<String, Object>> read(Sheet sheet) {
        int firstRowNum = sheet.getFirstRowNum();
        int lastRowNum = sheet.getLastRowNum();
        if (lastRowNum < 0) {
            return ListUtil.empty();
        } else if (this.headerRowIndex < firstRowNum) {
            throw new IndexOutOfBoundsException(StrUtil.format("Header row index {} is lower than first row index {}.", new Object[]{this.headerRowIndex, firstRowNum}));
        } else if (this.headerRowIndex > lastRowNum) {
            throw new IndexOutOfBoundsException(StrUtil.format("Header row index {} is greater than last row index {}.", new Object[]{this.headerRowIndex, firstRowNum}));
        } else {
            int startRowIndex = Math.max(this.startRowIndex, firstRowNum);
            int endRowIndex = Math.min(this.endRowIndex, lastRowNum);
            List<String> headerList = this.aliasHeader(this.readRow(sheet, this.headerRowIndex));
            List<Map<String, Object>> result = new ArrayList(endRowIndex - startRowIndex + 1);

            for(int i = startRowIndex; i <= endRowIndex; ++i) {
                if (i != this.headerRowIndex) {
                    List<Object> rowList = this.readRow(sheet, i);
                    if (CollUtil.isNotEmpty(rowList) || !this.ignoreEmptyRow) {
                        result.add(**IterUtil.toMap(headerList, rowList, true)**);
                    }
                }
            }
            return result;
        }
    }

此时问题就出现了，Map 是允许key 和 value 是null的，所以即使表头为空，调用这个 IterUtil.toMap 也不会报错，所以表头中出现大量的列为空字符串时，就出现了耗时问题。

解决方案

获取表头所在行，然后去除后面的空字符串列

private static List<Object> getRows(Sheet next, int rowIndex) {
        List<Object> headerList = new ArrayList<>();
        Iterator<Cell> cellIterator = next.getRow(rowIndex).cellIterator();
        while (cellIterator.hasNext()) {
            Cell cell = cellIterator.next();
            String stringCellValue = cell.getStringCellValue();
            headerList.add(stringCellValue);
        }
        return headerList;
    }

然后遍历每行记录，调用 IterUtil.toMap(headerList, rowList, true);方法转换

		if (headerList.size() == rowLists.size()) {
                  MapSheetReader mapSheetReader = new MapSheetReader(headerRowIndex, next.getFirstRowNum(), next.getLastRowNum());
                  read = mapSheetReader.read(next);
              } else {
                  // 处理数据的源码 当headerList 存在表头是空字符串时 ， 会引起解析时间非常长
                  for (int i = next.getFirstRowNum(); i <= next.getLastRowNum(); ++i) {
                      if (i != headerRowIndex) {
                          List<Object> rowList = getRows(next, i);
                          if (CollUtil.isNotEmpty(rowList)) {
                              Map<String, Object> map = IterUtil.toMap(headerList, rowList, true);
                              read.add(map);
                          }
                      }
                  }
              }