下面两个方法,在处理 20000行*100列 的数据时,在 1核1G 的服务器上仍然会内存溢出。
方法一:
Office软件一直是一个诲誉参半的软件,广大普通计算机用户用Office来满足日常办公需求,于是就产生了很多生产数据和文档,需要和企业单位的专用办公系统对接,
而Office的解析工作一直是程序员非常头痛的问题,经常招致程序员的谩骂,也被誉为是微软最烂的发明之一。POI的诞生解决了Excel的解析难题(POI即“讨厌的电子表格”,确实很讨厌,我也很讨厌Excel),
但如果用不好POI,也会导致程序出现一些BUG,例如内存溢出,假空行,公式等等问题。下面介绍一种解决POI读取Excel内存溢出的问题。
POI读取Excel有两种模式,一种是用户模式,一种是SAX模式,将xlsx格式的文档转换成CVS格式后再进行处理用户模式相信大家都很清楚,也是POI常用的方式,
用户模式API接口丰富,我们可以很容易的使用POI的API读取Excel,但用户模式消耗的内存很大,当遇到很多sheet、大数据网格、假空行、公式等问题时,很容易导致内存溢出。
POI官方推荐解决内存溢出的方式使用CVS格式解析,我们不可能手工将Excel文件转换成CVS格式再上传,这样做太麻烦了,好再POI给出了xlsx转换CVS的例子,基于这个例子我进行了一下改造,
即可解决用户模式读取Excel内存溢出的问题。
使用poi官网提供的SXSSFWorkbook来生成大文件excel,点击查看官网说明。
下面是根据官网说明,稍作改动的可测试代码:
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.junit.Assert;
import java.io.FileOutputStream;
import java.io.IOException;
/**
* 测试使用SXSSFWorkbook 生成大excel 文件
*
* @author jinxl
* @version 1.0 on 2018/1/24
*/
public class NewExcelWriteTest {
public static void main(String[] args) {
// 需要在内存中保留的最大行数
int memoryMaxRow = 100;
// 测试的最大行数
int maxRow = 100000;
// keep memoryMaxRow rows in memory, exceeding rows will be flushed to disk
// 在内存中保留 memoryMaxRow 行,超过行将被刷新到磁盘
SXSSFWorkbook wb = new SXSSFWorkbook(memoryMaxRow);
Sheet sh = wb.createSheet();
for (int rowNum = 0; rowNum < maxRow; rowNum++) {
Row row = sh.createRow(rowNum);
for (int cellNum = 0; cellNum < 100; cellNum++) {
Cell cell = row.createCell(cellNum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rowNum < (maxRow - memoryMaxRow) are flushed and not accessible
// rowNum < (maxRow - memoryMaxRow) 的行被刷新,无法访问
for (int rowNum = 0; rowNum < maxRow - memoryMaxRow; rowNum++) {
Assert.assertNull(sh.getRow(rowNum));
}
// the last memoryMaxRow rows are still in memory
// 最后 memoryMaxRow 行仍在内存中
for (int rowNum = maxRow - memoryMaxRow; rowNum < maxRow; rowNum++) {
Assert.assertNotNull(sh.getRow(rowNum));
}
FileOutputStream out = null;
try {
out = new FileOutputStream("E://bigExcel.xlsx");
wb.write(out);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (null != out) {
try {
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
// dispose of temporary files backing this workbook on disk
// 在磁盘上处理支持此工作簿的临时文件
wb.dispose();
}
}
方法二:
下面这种不太友好,但也能勉强使用。
官网demo略做调整,下面是可测试的代码:
/* ====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==================================================================== */
import org.apache.poi.openxml4j.opc.internal.ZipHelper;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FillPatternType;
import org.apache.poi.ss.usermodel.HorizontalAlignment;
import org.apache.poi.ss.usermodel.IndexedColors;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.usermodel.*;
import java.io.*;
import java.util.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
/**
* Demonstrates a workaround you can use to generate large workbooks and avoid OutOfMemory exception.
* <p>
* Note - You probably <em>don't want to use this approach any more</em>! POI
* now includes the SXSSF which handles all of this for you, you should
* be using that instead! This code remains mostly for historical interest.
* <p>
* See <a "http://poi.apache.org/spreadsheet/how-to.html#sxssf">
* http://poi.apache.org/spreadsheet/how-to.html#sxssf</a>.
* <p>
* If you really want to use this approach, which is also the one that SXSSF
* does for you, it works as follows:
* <p>
* 1. create a template workbook, create sheets and global objects such as cell styles, number formats, etc.
* 2. create an application that streams data in a text file
* 3. Substitute the sheet in the template with the generated data
* <p>
* <p>
* Since 3.8 POI provides a low-memory footprint SXSSF API, which implements
* ths "BigGridDemo" strategy. SXSSF is an API-compatible streaming extension
* of XSSF to be used when very large spreadsheets have to be produced, and
* heap space is limited. SXSSF achieves its low memory footprint by limiting
* access to the rows that are within a sliding window, while XSSF gives access
* to all rows in the document. Older rows that are no longer in the window
* become inaccessible, as they are written to the disk.
* </p>
* See <a "http://poi.apache.org/spreadsheet/how-to.html#sxssf">
* http://poi.apache.org/spreadsheet/how-to.html#sxssf</a>.
*
* @author Yegor Kozlov
*/
public class BigGridDemo {
private static final String XML_ENCODING = "UTF-8";
public static void main(String[] args) throws Exception {
// Step 1. Create a template file. Setup sheets and workbook-level objects such as
// cell styles, number formats, etc.
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sheet = wb.createSheet("Big Grid");
Map<String, XSSFCellStyle> styles = createStyles(wb);
//name of the zip entry holding sheet data, e.g. /xl/worksheets/sheet1.xml
String sheetRef = sheet.getPackagePart().getPartName().getName();
//save the template
try (FileOutputStream os = new FileOutputStream("E://template.xlsx")) {
wb.write(os);
}
//Step 2. Generate XML file.
File tmp = File.createTempFile("sheet", ".xml");
try (Writer fw = new OutputStreamWriter(new FileOutputStream(tmp), XML_ENCODING)) {
generate(fw, styles);
}
//Step 3. Substitute the template entry with the generated data
try (FileOutputStream out = new FileOutputStream("E://big-grid.xlsx")) {
substitute(new File("E://template.xlsx"), tmp, sheetRef.substring(1), out);
}
}
/**
* Create a library of cell styles.
*/
private static Map<String, XSSFCellStyle> createStyles(XSSFWorkbook wb) {
Map<String, XSSFCellStyle> styles = new HashMap<>();
XSSFDataFormat fmt = wb.createDataFormat();
XSSFCellStyle style1 = wb.createCellStyle();
style1.setAlignment(HorizontalAlignment.RIGHT);
style1.setDataFormat(fmt.getFormat("0.0%"));
styles.put("percent", style1);
XSSFCellStyle style2 = wb.createCellStyle();
style2.setAlignment(HorizontalAlignment.CENTER);
style2.setDataFormat(fmt.getFormat("0.0X"));
styles.put("coeff", style2);
XSSFCellStyle style3 = wb.createCellStyle();
style3.setAlignment(HorizontalAlignment.RIGHT);
style3.setDataFormat(fmt.getFormat("$#,##0.00"));
styles.put("currency", style3);
XSSFCellStyle style4 = wb.createCellStyle();
style4.setAlignment(HorizontalAlignment.RIGHT);
style4.setDataFormat(fmt.getFormat("mmm dd"));
styles.put("date", style4);
XSSFCellStyle style5 = wb.createCellStyle();
XSSFFont headerFont = wb.createFont();
headerFont.setBold(true);
style5.setFillForegroundColor(IndexedColors.GREY_25_PERCENT.getIndex());
style5.setFillPattern(FillPatternType.SOLID_FOREGROUND);
style5.setFont(headerFont);
styles.put("header", style5);
return styles;
}
private static void generate(Writer out, Map<String, XSSFCellStyle> styles) throws Exception {
Random rnd = new Random();
Calendar calendar = Calendar.getInstance();
SpreadsheetWriter sw = new SpreadsheetWriter(out);
sw.beginSheet();
//insert header row
sw.insertRow(0);
int styleIndex = styles.get("header").getIndex();
sw.createCell(0, "Title", styleIndex);
sw.createCell(1, "% Change", styleIndex);
sw.createCell(2, "Ratio", styleIndex);
sw.createCell(3, "Expenses", styleIndex);
sw.createCell(4, "Date", styleIndex);
sw.endRow();
//write data rows
for (int rownum = 1; rownum < 100000; rownum++) {
sw.insertRow(rownum);
sw.createCell(0, "Hello, " + rownum + "!");
sw.createCell(1, (double) rnd.nextInt(100) / 100, styles.get("percent").getIndex());
sw.createCell(2, (double) rnd.nextInt(10) / 10, styles.get("coeff").getIndex());
sw.createCell(3, rnd.nextInt(10000), styles.get("currency").getIndex());
sw.createCell(4, calendar, styles.get("date").getIndex());
sw.endRow();
calendar.roll(Calendar.DAY_OF_YEAR, 1);
}
sw.endSheet();
}
/**
* @param zipfile the template file
* @param tmpfile the XML file with the sheet data
* @param entry the name of the sheet entry to substitute, e.g. xl/worksheets/sheet1.xml
* @param out the stream to write the result to
*/
private static void substitute(File zipfile, File tmpfile, String entry, OutputStream out) throws IOException {
try (ZipFile zip = ZipHelper.openZipFile(zipfile)) {
try (ZipOutputStream zos = new ZipOutputStream(out)) {
Enumeration<? extends ZipEntry> en = zip.entries();
while (en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
if (!ze.getName().equals(entry)) {
zos.putNextEntry(new ZipEntry(ze.getName()));
try (InputStream is = zip.getInputStream(ze)) {
copyStream(is, zos);
}
}
}
zos.putNextEntry(new ZipEntry(entry));
try (InputStream is = new FileInputStream(tmpfile)) {
copyStream(is, zos);
}
}
}
}
private static void copyStream(InputStream in, OutputStream out) throws IOException {
byte[] chunk = new byte[1024];
int count;
while ((count = in.read(chunk)) >= 0) {
out.write(chunk, 0, count);
}
}
/**
* Writes spreadsheet data in a Writer.
* (YK: in future it may evolve in a full-featured API for streaming data in Excel)
*/
public static class SpreadsheetWriter {
private final Writer _out;
private int _rownum;
public SpreadsheetWriter(Writer out) {
_out = out;
}
public void beginSheet() throws IOException {
_out.write("<?xml version=\"1.0\" encoding=\"" + XML_ENCODING + "\"?>" +
"<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">");
_out.write("<sheetData>\n");
}
public void endSheet() throws IOException {
_out.write("</sheetData>");
_out.write("</worksheet>");
}
/**
* Insert a new row
*
* @param rownum 0-based row number
*/
public void insertRow(int rownum) throws IOException {
_out.write("<row r=\"" + (rownum + 1) + "\">\n");
this._rownum = rownum;
}
/**
* Insert row end marker
*/
public void endRow() throws IOException {
_out.write("</row>\n");
}
public void createCell(int columnIndex, String value, int styleIndex) throws IOException {
String ref = new CellReference(_rownum, columnIndex).formatAsString();
_out.write("<c r=\"" + ref + "\" t=\"inlineStr\"");
if (styleIndex != -1) _out.write(" s=\"" + styleIndex + "\"");
_out.write(">");
_out.write("<is><t>" + value + "</t></is>");
_out.write("</c>");
}
public void createCell(int columnIndex, String value) throws IOException {
createCell(columnIndex, value, -1);
}
public void createCell(int columnIndex, double value, int styleIndex) throws IOException {
String ref = new CellReference(_rownum, columnIndex).formatAsString();
_out.write("<c r=\"" + ref + "\" t=\"n\"");
if (styleIndex != -1) _out.write(" s=\"" + styleIndex + "\"");
_out.write(">");
_out.write("<v>" + value + "</v>");
_out.write("</c>");
}
public void createCell(int columnIndex, double value) throws IOException {
createCell(columnIndex, value, -1);
}
public void createCell(int columnIndex, Calendar value, int styleIndex) throws IOException {
createCell(columnIndex, DateUtil.getExcelDate(value, false), styleIndex);
}
}
}
根据方法二改编出来的两个类,如下:
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.util.CellReference;
import java.io.IOException;
import java.io.Writer;
import java.util.Calendar;
/**
* Writes spreadsheet data in a Writer.
* (YK: in future it may evolve in a full-featured API for streaming data in Excel)
* <p>
* 在Writer中写入电子表格数据。
* (YK:将来它可能演变成一个全功能的API,用于在Excel中传输数据)
*
* @author jinxl
* @version 1.0 on 2018/1/23
*/
public class SpreadsheetWriter {
private final Writer outWrite;
private int rowNumber;
public SpreadsheetWriter(Writer out) {
outWrite = out;
init();
}
/**
* 初始化xml 文件
*/
public SpreadsheetWriter init() {
try {
outWrite.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
outWrite.write("<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 开始一个sheet 页
*/
public SpreadsheetWriter beginSheet() {
try {
outWrite.write("<sheetData>\n");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 结束该sheet 页
*/
public SpreadsheetWriter endSheet() {
try {
outWrite.write("</sheetData>");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 结束sheet 和工作表
*/
public SpreadsheetWriter endSheetAndWorksheet() {
return endSheet().endWorksheet();
}
/**
* 结束这个工作表
*/
public SpreadsheetWriter endWorksheet() {
try {
outWrite.write("</worksheet>");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* Insert a new row
* 新增一行
*
* @param rowNum 0-based row number 行下标,从0 开始
*/
public SpreadsheetWriter insertRow(int rowNum) {
try {
outWrite.write("<row r=\"" + (rowNum + 1) + "\">\n");
} catch (IOException e) {
throw new RuntimeException(e);
}
this.rowNumber = rowNum;
return this;
}
/**
* Insert row end marker
* 插入行结束标记
*/
public SpreadsheetWriter endRow() {
try {
outWrite.write("</row>\n");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 创建一个存放字符串的单元格
*
* @param columnIndex 单元格的列下标,从0 开始
* @param value 需要放入单元格的字符串
* @param styleIndex 单元格样式的下标,不指定样式则为-1
*/
public SpreadsheetWriter createCell(int columnIndex, String value, int styleIndex) {
if (null == value) {
value = "";
}
String ref = new CellReference(rowNumber, columnIndex).formatAsString();
try {
outWrite.write("<c r=\"" + ref + "\" t=\"inlineStr\"");
if (styleIndex != -1) {
outWrite.write(" s=\"" + styleIndex + "\"");
}
outWrite.write(">");
outWrite.write("<is><t><![CDATA[" + value + "]]></t></is>");
outWrite.write("</c>");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 创建一个存放字符串的单元格,使用默认样式
*
* @param columnIndex 单元格的列下标,从0 开始
* @param value 需要放入单元格的字符串
*/
public SpreadsheetWriter createCell(int columnIndex, String value) {
return createCell(columnIndex, value, -1);
}
/**
* 创建一个存放浮点型数值的单元格
*
* @param columnIndex 单元格的列下标,从0 开始
* @param value 需要放入单元格的浮点型数值
* @param styleIndex 单元格样式的下标,不指定样式则为-1
*/
public SpreadsheetWriter createCell(int columnIndex, Double value, int styleIndex) {
if (null == value) {
return createCell(columnIndex, "", styleIndex);
}
String ref = new CellReference(rowNumber, columnIndex).formatAsString();
try {
outWrite.write("<c r=\"" + ref + "\" t=\"n\"");
if (styleIndex != -1) {
outWrite.write(" s=\"" + styleIndex + "\"");
}
outWrite.write(">");
outWrite.write("<v>" + value + "</v>");
outWrite.write("</c>");
} catch (IOException e) {
throw new RuntimeException(e);
}
return this;
}
/**
* 创建一个存放浮点型数值的单元格,使用默认样式
*
* @param columnIndex 单元格的列下标,从0 开始
* @param value 需要放入单元格的浮点型数值
*/
public SpreadsheetWriter createCell(int columnIndex, Double value) {
return createCell(columnIndex, value, -1);
}
/**
* 创建日期格式单元格,使用默认样式
*
* @param columnIndex 单元格的列下标,从0 开始
* @param value 需要放入单元格的日期
*/
public SpreadsheetWriter createCell(int columnIndex, Calendar value) {
return createCell(columnIndex, DateUtil.getExcelDate(value, false), -1);
}
/**
* 批量设置单元格的值
*
* @param startIndex 开始下标
* @param value 单元格的值
*/
public SpreadsheetWriter setCellValues(Integer startIndex, String... value) {
for (int j = 0; j < value.length; j++) {
createCell(startIndex + j, value[j]);
}
return this;
}
/**
* 批量设置单元格的值,从当前行下标为0 的列开始
*
* @param value 单元格的值
*/
public SpreadsheetWriter setCellValues(String... value) {
return setCellValues(0, value);
}
}
import org.apache.poi.openxml4j.opc.internal.ZipHelper;
import java.io.*;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
/**
* @author jinxl
* @version 1.0 on 2018/1/23
*/
public class PoiUtils {
/**
* 把临时文件里的数据替换到模板文件中
*
* @param zipFile the template file
* 模板文件
* @param tmpFile the XML file with the sheet data
* 带有表单数据的XML文件
* @param entry the name of the sheet entry to substitute, e.g. xl/worksheets/sheet1.xml
* 要替换的工作表条目的名称,例如XL /工作表/ sheet1.xml
* @param out the stream to write the result to
* 将结果写入的流
*/
public static void substitute(File zipFile, File tmpFile, String entry, OutputStream out) {
try {
try (ZipFile zip = ZipHelper.openZipFile(zipFile)) {
try (ZipOutputStream zos = new ZipOutputStream(out)) {
Enumeration<? extends ZipEntry> en = zip.entries();
while (en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
if (!ze.getName().equals(entry)) {
zos.putNextEntry(new ZipEntry(ze.getName()));
try (InputStream is = zip.getInputStream(ze)) {
copyStream(is, zos);
}
}
}
zos.putNextEntry(new ZipEntry(entry));
try (InputStream is = new FileInputStream(tmpFile)) {
copyStream(is, zos);
}
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
/**
* 流拷贝
*
* @param in 输入流
* @param out 输出流
*/
private static void copyStream(InputStream in, OutputStream out) {
byte[] chunk = new byte[1024];
int count;
try {
while ((count = in.read(chunk)) >= 0) {
out.write(chunk, 0, count);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
记一次写入excel内存溢出后的资料查找。