JAVA导入大量数据的Excel，如何分块读取并避免内存溢出？

qq_42239069

已于 2023-03-16 10:40:06 修改

阅读量3.6k

点赞数 2

文章标签： java 后端 excel Powered by 金山文档

于 2023-03-15 18:54:53 首次发布

本文链接：https://blog.csdn.net/qq_42239069/article/details/129564027

版权

当需要将大量数据导入到Java应用程序中时，经常会遇到内存不足的问题。在这种情况下，将数据分成块并一次读取一个块是一种可行的解决方案。本篇文章将介绍如何使用Java分块读取Excel数据，并导入到数据库中。

优点：

该模式并不会一次性将所有数据读入内存。

maven依赖：

maven地址参考：https://blog.csdn.net/fengyuyeguirenenen/article/details/128098090

    <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>5.0.0</version>
    </dependency>

java 示例代码:

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.WorkbookFactory;

public class ExcelImporter {
    public static final int BLOCK_SIZE = 1000;

    public static void main(String[] args) {
        String fileName = "data.xlsx";
        File file = new File(fileName);

        try (InputStream in = new FileInputStream(file)) {
            Workbook workbook = WorkbookFactory.create(in);
            Sheet sheet = workbook.getSheetAt(0);

            int lastRowNum = sheet.getLastRowNum();
            int totalRows = lastRowNum + 1;
            int numBlocks = (int) Math.ceil((double) totalRows / BLOCK_SIZE);

            for (int i = 0; i < numBlocks; i++) {
                int startRow = i * BLOCK_SIZE;
                int endRow = Math.min(startRow + BLOCK_SIZE - 1, lastRowNum);

                for (int j = startRow; j <= endRow; j++) {
                    Row row = sheet.getRow(j);
                    if (row != null) {
                        Cell cell1 = row.getCell(0);
                        Cell cell2 = row.getCell(1);
                        // 解析其他单元格
                        System.out.println(cell1.getStringCellValue() + ", " + cell2.getStringCellValue());
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

逻辑：

在这个示例代码中，我们使用了 Apache POI 库来读取 Excel 文件，以及使用了分块读取的方式，每次读取 BLOCK_SIZE（在这里设为1000）行数据。在读取每个块的数据时，我们遍历块中的每一行和单元格，执行相应的解析操作，并将结果输出。因此，该模式并不会一次性将所有数据读入内存。

注意事项：

在实际执行过程中，当调用 WorkbookFactory.create(in) 方法时，Apache POI 库会将 Excel 文件中的数据解析为一组对象（如 Sheet 对象、Row 对象、Cell 对象等），这些对象会占用一定的内存。然而，当我们采用分块读取的方式时，只有当前块中的数据会被读入内存并处理，而不是一次性读取所有数据。

当处理数据时，如果采用了不合适的算法或数据结构，也可能会导致一次性将所有数据读入内存。因此，在处理大量数据时，需要根据实际情况选择合适的算法和数据结构，并采取适当的措施来避免一次性将所有数据读入内存。