Excel文件解析性能对比（POI，easyexcel，xlsx-streamer）

骥千里

已于 2022-12-04 17:49:16 修改

阅读量3.7k

点赞数 2

文章标签： excel java

于 2022-12-04 17:29:20 首次发布

本文链接：https://blog.csdn.net/jinguo_zhang/article/details/128172244

版权

问题：

在解析excel内容时，由于用户误操作，偶尔会误录入大量空行，如果代码处置不当，进行文档内容解析时，就会占用大量时间和内存，甚至引起服务器内存消耗殆尽，导致服务无法访问，或者引起OOM。

解决思路：

对于常用功能，用户通过excel导入的有效数据行数通常不多，可以通过以下策略管控文件解析功能：

1.先对上传的文件大小进行限制，防止过大的excel文件上传。

2.预先设置一个合理的行数值，解析文件内容前，先校验文件数据行数是否在预设范围内，若超出则提示用户文件行数过大，请用户检查文件内容，清理掉不需要的空行、空对象等。

3.在文件解析过程中，需注意保证解析对象尽量小，尽快入库存储，能够被垃圾回收器回收，避免长时间驻留在内存中，难以回收，引起FGC，甚至OOM。

本文主要就第2点分别使用POI，easyexcel，xlsx-streamer3种包进行比较分析，观察各自的表现。

验证过程：

验证场景设计如下——

1、测试文件：准备2个excel文件（.xlsx），一个只有一列，10行以内的数据，文件大小9K；另一个文件则在第1个测试文件的基础上，追加了140.8万的空行，文件大小5.2M。

2、比较POI，easyexcel，xlsx-streamer3种技术方案分别以File对象和文件流两种方式读写测试文件时的表现。

以下是试验的代码：

pom.xml 文件：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>excelProject</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>4.1.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.logging.log4j</groupId>
                    <artifactId>log4j-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>4.1.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.logging.log4j</groupId>
                    <artifactId>log4j-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.17.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.17.1</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>easyexcel</artifactId>
            <version>3.1.1</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.monitorjbl</groupId>
            <artifactId>xlsx-streamer</artifactId>
            <version>2.1.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.poi</groupId>
                    <artifactId>poi-ooxml-schemas</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.poi</groupId>
                    <artifactId>poi-ooxml</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.poi</groupId>
                    <artifactId>poi</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.30</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.30</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-nop</artifactId>
            <version>1.7.30</version>
        </dependency>
        <!-- JUNIT5 -->
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter</artifactId>
            <version>5.8.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.junit.platform</groupId>
            <artifactId>junit-platform-suite</artifactId>
            <version>1.8.0</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

定义Excel文件解析接口

package cn.com.excel;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

public interface ExcelService {
    public static long MAX_ROW_NUM = 5000L;

    public void analysisExcel(InputStream is) throws IOException;

    public void analysisExcel(File file) throws IOException, InvalidFormatException;
}

使用POI包实现行数统计

package cn.com.excel;


import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.*;


public class POIService implements ExcelService {

    public void analysisExcel(InputStream is) throws IOException, FileNotFoundException {

        XSSFWorkbook xsswb = new XSSFWorkbook(is);
        XSSFSheet sheet = xsswb.getSheetAt(0);
        System.out.println("rownum:" + sheet.getPhysicalNumberOfRows());
    }

    public void analysisExcel(File is) throws IOException, FileNotFoundException, InvalidFormatException {

        XSSFWorkbook xsswb = new XSSFWorkbook(is);
        XSSFSheet sheet = xsswb.getSheetAt(0);
        System.out.println("rownum:" + sheet.getPhysicalNumberOfRows());
    }
}

使用easyexcel实现行数统计

package cn.com.excel;

import com.alibaba.excel.EasyExcel;
import com.alibaba.excel.context.AnalysisContext;
import com.alibaba.excel.read.listener.ReadListener;

import java.io.File;
import java.io.InputStream;

public class EasyExcelService implements ExcelService {

    public void analysisExcel(InputStream is) {
        MyEasyExcelListener listener = new MyEasyExcelListener<Object>();
        EasyExcel.read(is, listener).useDefaultListener(false).sheet().doRead();
        System.out.println("rownum:" + listener.getCount());
    }

    public void analysisExcel(File file) {
        MyEasyExcelListener listener = new MyEasyExcelListener<Object>();
        EasyExcel.read(file, listener).useDefaultListener(false).sheet().doRead();
        System.out.println("rownum:" + listener.getCount());
    }
}


class MyEasyExcelListener<T extends Object> implements ReadListener<T> {
    private long count = 0L;

    @Override
    public void invoke(T t, AnalysisContext analysisContext) {
        count++;
    }

    @Override
    public void doAfterAllAnalysed(AnalysisContext analysisContext) {

    }

    public long getCount() {
        return this.count;
    }
}

使用xlsx-streamer实现行数统计，该方法以计数器到允许最大行数即中止，若想观察遍历完整excel行数，注释掉isExceedMaxRows方法中的内容即可。

package cn.com.excel;

import com.monitorjbl.xlsx.StreamingReader;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;

import java.io.File;
import java.io.InputStream;

public class XlsxStreamerService implements ExcelService {
    private long maxRows = MAX_ROW_NUM;

    public void setMaxRows(long maxRows) {
        this.maxRows = maxRows;
    }

    public void analysisExcel(InputStream is) {
        Workbook wk = StreamingReader.builder().rowCacheSize(100).bufferSize(1024).open(is);
        Sheet sheet = wk.getSheetAt(0);

        long count = 0L;
        for (Row row : sheet) {
            count++;
            isExceedMaxRows(count);
        }
        System.out.println("rownum:" + count);
    }

    public void analysisExcel(File file) {
        Workbook wk = StreamingReader.builder().rowCacheSize(100).bufferSize(1024).open(file);
        Sheet sheet = wk.getSheetAt(0);

        long count = 0L;
        for (Row row : sheet) {
            count++;
            isExceedMaxRows(count);
        }
        System.out.println("rownum:" + count);
    }

    private void isExceedMaxRows(long count) {
        if (count > this.maxRows) {
            System.out.println("rownum:" + count);
            throw new RuntimeException("已经超出最大行数:" + this.maxRows + "！");
        }
    }
}

性能对比：

通过编写单元测试案例来观察3种技术方案的表现效果。

Excel文件解析接口单元测试案例

package cn.com.excel;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.junit.jupiter.api.Test;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;

public abstract class ExcelServiceTest {
    protected ExcelService excelService;

    @Test
    public void testEasyStreamReadExcel() throws IOException {
        InputStream is = ExcelServiceTest.this.getClass().getResourceAsStream("/smallFile.xlsx");
        excelService.analysisExcel(is);
    }

    @Test
    public void testHardStreamReadExcel() throws IOException {
        InputStream is = ExcelServiceTest.this.getClass().getResourceAsStream("/bigEmptyFile.xlsx");
        excelService.analysisExcel(is);
    }

    @Test
    public void testEasyFileReadExcel() throws IOException, InvalidFormatException {
        URL url = ExcelServiceTest.this.getClass().getResource("/smallFile.xlsx");
        String excelFilePath = url.getFile();
        extractedFile(excelFilePath);
    }

    @Test
    public void testHardFileReadExcel() throws IOException, InvalidFormatException {
        URL url = ExcelServiceTest.this.getClass().getResource("/bigEmptyFile.xlsx");
        String excelFilePath = url.getFile();
        extractedFile(excelFilePath);
    }

    private void extractedFile(String excelFilePath) throws IOException, InvalidFormatException {
        File file = new File(excelFilePath);
        excelService.analysisExcel(file);
    }
}

POI测试案例：

package cn.com.excel;

public class POIServiceTest extends ExcelServiceTest {
    public POIServiceTest() {
        this.excelService = new POIService();
    }
}

easyexcel测试案例：

package cn.com.excel;

public class EasyExcelServiceTest  extends ExcelServiceTest{
    public EasyExcelServiceTest(){
        this.excelService = new EasyExcelService();
    }
}

xlsx-streamer测试案例：

package cn.com.excel;

public class XlsxStreamerServiceTest extends ExcelServiceTest {
    public XlsxStreamerServiceTest() {
        this.excelService = new XlsxStreamerService();
    }
}

测试案例包ExcelServiceSuite:

package cn.com.excel;

import org.junit.platform.suite.api.SelectClasses;
import org.junit.platform.suite.api.Suite;

@Suite
@SelectClasses({EasyExcelServiceTest.class, POIServiceTest.class, XlsxStreamerServiceTest.class})
public class ExcelServiceSuite {
}

单元测试结果截图，观察不同方案耗时

内存、CPU耗时截图

运行日志——

-----------------------------------------------------------------------------------------------------------------------------------------------
#POI
rownum:4
Cleaning up unclosed ZipFile for archive D:\WorkSpace\javaStudyWS\excelProject\target\classes\smallFile.xlsx
rownum:1048576
Cleaning up unclosed ZipFile for archive D:\WorkSpace\javaStudyWS\excelProject\target\classes\bigEmptyFile.xlsx
rownum:1048576
rownum:4
-----------------------------------------------------------------------------------------------------------------------------------------------
##easyexcel
rownum:3
rownum:5
rownum:5
rownum:3
-----------------------------------------------------------------------------------------------------------------------------------------------
#xlsx-streamer
rownum:4
rownum:5001
java.lang.RuntimeException: 已经超出最大行数:5000！

rownum:5001
java.lang.RuntimeException: 已经超出最大行数:5000！
rownum:4
-----------------------------------------------------------------------------------------------------------------------------------------------

小结

1、从文件对象和文件流输入比较，可以发现，文件流的处理效率通常比文件对象处理效率高，因此，建议大家尽量先用文件流的方式来解析文件。

2、从不同技术方案耗时上看，进行行数统计时，若最大行数取得小，xlsx-streamer只需要轮循到设置的最大行数，无需完整遍历，效率最高；easyexcel次之，需要特别注意的是，easyexcel示例代码会忽略空对象和空值，未统计到总行数中；POI效率最差。从CPU、内存消耗图进行比较，按以上述顺序，从小到大排列，即POI的资源消耗最多。

3、若最大值设置比较大，则xlsx-streamer的效率接近于遍历行数的效率，效率从高到低的顺序则为easyexcel、xlsx-streamer、POI；资源消耗从低到高也是这个顺序。有兴趣的朋友，可以自行注释掉XlsxStreamerService.isExceedMaxRows方法中的内容进行比较观察。

4、因此，easyexcel在大excel文件解析中表现最好，效率最有保证，而且会自动忽略空值和空对象的统计，资源消耗最少，性能最稳定；xlsx-streamer、POI由于都是按行遍历，数据最终转换为POI对象，应尽量避免应用在大文件解析的场景中，以免引起内存OOM或CPU大量消耗，对服务器性能造成较大影响。

5、针对不同的业务场景，excel的行数控制应有所区别，比如列数多、每个单元格内容平均较大的行数应该控制小一些；列数少、每个单元格内容平均较少的，行数可以适当放大一些；行数估计出来后，需要结合服务器本身的CPU运算能力、内存大小进行测试，确保性能消耗在可接受的范围内。

骥千里

关注

2
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
Excel文件解析性能对比（POI，easyexcel，xlsx-streamer）

在解析excel内容时，由于用户误操作，偶尔会误录入大量空行，如果代码处置不当，进行文档内容解析时，就会占用大量时间和内存，甚至引起服务器内存消耗殆尽，导致服务无法访问，或者引起OOM。
复制链接

扫一扫