最近在项目中遇到这样的场景,将一批json文件(上千个)数据转为list后,按照给定的excel模版格式,填充到模版中生成最终的excel文件。但是因为json文件过多,或者json文件中转为list后的数据量过大造成了oom异常:java.lang.OutOfMemoryError:Java heap space。
原因在于
XSSFWorkbook创建的 book sheet row cell 等,此时是存在内存的并没有持久化,那么随着数据量增大内存的需求量也就增大,那么很大可能就是要 OOM了。
怎么解决呢?这里我将XSSFWorkbook替换为SXSSFWorkbook。因为数据量过大导致内存吃不消无法写文件,有读一批写一批的做法吗? 答案是肯定的。怎么做?此种的情况就是设置内存中的最大存储条数。比如:new SXSSFWookbook(1000),此时当行数达到 1000 时,把内存中的数据写到临时文件中,那么这样就解决了大数据下OOM的问题。
SXSSFWorkbook用来生成海量excel数据文件,主要原理是借助临时存储空间生成excel,
SXSSFWorkbook专门处理大数据,对于大型excel的建立且不会内存溢出。它的原理很简单,用硬盘空间换内存(就像hashmap用空间换时间同样)。 SXSSFWorkbook是streaming版本的XSSFWorkbook,它只会保存最新的excel rows在内存里供查看,在此以前的excel rows都会被写入到硬盘里(Windows电脑的话,是写入到C盘根目录下的temp文件夹)。被写入到硬盘里的rows是不可见的/不可访问的。只有还保存在内存里的才能够被访问到。
注:HSSFWorkbook:03版Office Excel ,新建Excel默认保存的Excel文件格式的后缀是.xls,不可以打开编辑07版的xlsx文件,否则出现乱码或者卡死。行列的上限为65536行,256列。 XSSFWorkbook:07版Office Excel ,新建Excel默认保存的的Excel文件格式后缀是.xlsx,也能打开编辑03版的xls文件。行列的上限为1048575行,16384列。若是数据量超过了此上限,那么能够使用SXSSFWorkbook来导出。实际上上万条数据,甚至上千条数据就能够考虑使用SXSSFWorkbook了。
网络上大多是关于SXSSFWorkbook怎么创建sheet,怎么创建row,与我的需求不大匹配,所以我对自己的实现过程做了整理。
1.创建模版文件
temp.xlxs:
2.工具类
SXSSFUtils:
import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import java.util.Iterator;
public class SXSSFUtils {
/**
* @param fromSheet
* @param toSheet
*/
public static void mergeSheetAllRegion(Sheet fromSheet, Sheet toSheet) {
int num = fromSheet.getNumMergedRegions();
CellRangeAddress cellR = null;
for (int i = 0; i < num; i++) {
cellR = fromSheet.getMergedRegion(i);
toSheet.addMergedRegion(cellR);
}
}
/**
* @param wb
* @param fromCell
* @param toCell
*/
public static void copyCell(SXSSFWorkbook wb, Cell fromCell, Cell toCell) {
CellStyle cellStyle = wb.createCellStyle();
cellStyle.cloneStyleFrom(fromCell.getCellStyle());
toCell.setCellStyle(cellStyle);
if (fromCell.getCellComment() != null) {
toCell.setCellComment(fromCell.getCellComment());
}
CellType fromCellType = fromCell.getCellType();
toCell.setCellType(fromCellType);
if (fromCellType == CellType.NUMERIC) {
if (HSSFDateUtil.isCellDateFormatted(fromCell)) {
toCell.setCellValue(fromCell.getDateCellValue());
} else {
toCell.setCellValue(fromCell.getNumericCellValue());
}
} else if (fromCellType == CellType.STRING) {
toCell.setCellValue(fromCell.getRichStringCellValue());
} else if (fromCellType == CellType.BLANK) {
// nothing21
} else if (fromCellType == CellType.BOOLEAN) {
toCell.setCellValue(fromCell.getBooleanCellValue());
} else if (fromCellType == CellType.ERROR) {
toCell.setCellErrorValue(fromCell.getErrorCellValue());
} else if (fromCellType == CellType.FORMULA) {
toCell.setCellFormula(fromCell.getCellFormula());
} else { // nothing29
}
}
/**
* @param wb
* @param oldRow
* @param toRow
*/
public static void copyRow(SXSSFWorkbook wb, Row oldRow, Row toRow) {
toRow.setHeight(oldRow.getHeight());
for (Iterator cellIt = oldRow.cellIterator(); cellIt.hasNext();) {
Cell tmpCell = (Cell) cellIt.next();
Cell newCell = toRow.createCell(tmpCell.getColumnIndex());
copyCell(wb, tmpCell, newCell);
}
}
/**
* @param wb
* @param fromSheet
* @param toSheet
*/
public static void copySheet(SXSSFWorkbook wb, Sheet fromSheet, Sheet toSheet) {
mergeSheetAllRegion(fromSheet, toSheet);
for (Iterator rowIt = fromSheet.rowIterator(); rowIt.hasNext();) {
Row oldRow = (Row) rowIt.next();
Row newRow = toSheet.createRow(oldRow.getRowNum());
copyRow(wb, oldRow, newRow);
}
}
}
ExcelCellUtils:
import org.apache.poi.ss.usermodel.*;
import java.text.SimpleDateFormat;
import java.util.Date;
public class ExcelCellUtils {
public final static String DATE_OUTPUT_PATTERNS = "yyyy-MM-dd HH:mm:ss";
public final static SimpleDateFormat simpleDateFormat = new SimpleDateFormat(
DATE_OUTPUT_PATTERNS);
public static Object getCellValue(Cell cell) {
Object ret = null;
if (cell == null) return ret;
switch (cell.getCellType()) {
case BLANK:
ret = "";
break;
case BOOLEAN:
ret = cell.getBooleanCellValue();
break;
case ERROR:
ret = null;
break;
case FORMULA:
Workbook wb = cell.getSheet().getWorkbook();
CreationHelper crateHelper = wb.getCreationHelper();
FormulaEvaluator evaluator = crateHelper.createFormulaEvaluator();
ret = getCellValue(evaluator.evaluateInCell(cell));
break;
case NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
Date theDate = cell.getDateCellValue();
ret = simpleDateFormat.format(theDate);
} else {
ret = cell.getNumericCellValue();
}
break;
case STRING:
ret = cell.getRichStringCellValue().getString();
break;
default:
ret = "NaN";
}
return ret; // 有必要自行trim
}
}
3.读取Record数据,提取为list后填充到excel模版,生成excel
注:这里的Record本质上为HashMap,根据业务修改下即可。
/**
* 生成Excel
*/
public static void writeTemp(String fileTemp, String filePath, Record dataMap) throws Exception {
long startTime = System.currentTimeMillis();
InputStream stream = null;
XSSFWorkbook rwb = null;
SXSSFWorkbook sxssfWorkbook = null;
BufferedOutputStream outputStream = null;
try {
stream = new FileInputStream(fileTemp);
rwb = new XSSFWorkbook(stream);
sxssfWorkbook = new SXSSFWorkbook();
for (int i = 0; i < rwb.getNumberOfSheets(); i++) {
XSSFSheet sheet = rwb.getSheetAt(i);
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet(sheet.getSheetName());
SXSSFUtils.copySheet(sxssfWorkbook, sheet, sxssfSheet);
int addRow = 0;
Row headRow = null;
for (int ri = 0; ri <= sheet.getLastRowNum(); ri++) {
Row row = sheet.getRow(ri + addRow);
if (row == null) {
break;
}
boolean isContainList = false;
boolean isContainSpecialChar = false;
for (int c = 0; c < row.getLastCellNum(); c++) {
org.apache.poi.ss.usermodel.Cell cell = row.getCell(c);
if (cell != null) {
if (cell.getCellType() != CellType.FORMULA) {
String strVal = String.valueOf(ExcelCellUtils.getCellValue(cell));
String str = strVal;
if (str.indexOf("{{") > -1 && str.indexOf("}}") > -1) {
isContainSpecialChar = true;
str = str.substring(str.indexOf("{{"), str.indexOf("}}") + 2);
String[] field = str.substring(2, str.indexOf("}}")).split("\\.");
Record dataR = null;
int fiLength = field.length - 1;
headRow = sheet.getRow(ri - 1);
if (field.length == 1) {
//单值{{key}}
dataR = RecordUtils.toRecord(dataMap);
Object value = "";
if (dataR != null) {
//Map取值
value = valueFormat(field[fiLength], dataR);
str = str.replaceAll("\\{", "##");
strVal = strVal.replaceAll("\\{", "##");
str = str.replaceAll("\\}", "##");
strVal = strVal.replaceAll("\\}", "##");
value = strVal.replaceFirst(str, value + "");
}
setCellValue(cell, value);
} else {
//{$list.xxx}}格式处理
if (field[0].indexOf("$") == 0) {
//循环值
List<String> cellList = new ArrayList<>();
for (int c2 = 0; c2 < row.getLastCellNum(); c2++) {
org.apache.poi.ss.usermodel.Cell cell2 = row.getCell(c2);
String str2 = null;
if (cell2 != null) {
str2 = String.valueOf(ExcelCellUtils.getCellValue(cell2));
}
cellList.add(str2);
}
List<Record> dataList = RecordUtils.toList(dataMap.get(field[0].substring(1)));
if (Objects.isNull(dataList)) {
dataList = new ArrayList<>();
}
if (CollectionUtils.isEmpty(dataList)) {
dataList.add(new Record());
}
//模版字符串替换行{{$list.xxx}}
SXSSFRow sxssfReplaceRow = sxssfSheet.getRow(ri);
if (!Objects.isNull(sxssfReplaceRow)) {
sxssfSheet.removeRow(sxssfReplaceRow);
}
for (int in = 0; in < dataList.size(); in++) {
SXSSFRow newRow = sxssfSheet.createRow(ri + in);
if (!Objects.isNull(headRow)) {
newRow.setHeight(headRow.getHeight());
}
dataR = dataList.get(in);
for (int jn = 0; jn < row.getLastCellNum(); jn++) {
org.apache.poi.ss.usermodel.Cell templateCell = row.getCell(jn);
if (templateCell != null) {
org.apache.poi.ss.usermodel.Cell newCell = newRow.createCell(jn);
copyCell(templateCell, newCell);
//设置sxssf cell单元格的值
setSxssfCellValue(newCell, cellList, fiLength, dataR);
}
}
}
addRow += dataList.size();
isContainList = true;
break;
}
}
}
}
}
}
//如果该行没有{{$list.xxx}},并且模版中存在{{xxx}}的行,则进行行拷贝
if (!isContainList && isContainSpecialChar) {
copyRow(row, sxssfSheet.getRow(ri));
}
}
//自动调整列宽
autoCellWidth(sxssfSheet, headRow);
}
outputStream = new BufferedOutputStream(new FileOutputStream(filePath));
sxssfWorkbook.write(outputStream);
outputStream.flush();
} catch (Exception e) {
log.error("{}处理异常:{}", filePath, e.getMessage());
e.printStackTrace();
} finally {
if (stream != null) {
stream.close();
}
if (outputStream != null) {
outputStream.close();
}
if (rwb != null) {
rwb.close();
}
if (sxssfWorkbook != null) {
sxssfWorkbook.dispose();// 释放workbook所占用的所有资源
}
}
log.info("{} 处理完成耗时:{}秒", filePath, (System.currentTimeMillis() - startTime) / 1000);
}
/**
* 自动设置单元格宽度
*
* @param sxssfSheet
* @param headRow
*/
public static void autoCellWidth(SXSSFSheet sxssfSheet, Row headRow) {
sxssfSheet.trackAllColumnsForAutoSizing();
for (int i1 = 0; i1 < headRow.getLastCellNum(); i1++) {
sxssfSheet.autoSizeColumn(i1);
//手动调整列宽,解决中文不能自适应问题
//单元格单行最长支持32*256宽度(每个单元格样式已经设置自动换行,超出即换行)
//设置最低列宽度,列宽约8个中文字符
int width = Math.max(8 * 256, Math.min(32 * 256, sxssfSheet.getColumnWidth(i1) * 12 / 10));
sxssfSheet.setColumnWidth(i1, width);
}
}
/**
* 设置sxssf cell单元格的值
*
* @param newCell
* @param cellList
* @param fiLength
* @param dataR
*/
public static void setSxssfCellValue(Cell newCell, List<String> cellList, int fiLength, Record dataR) {
String cellValue = newCell.getStringCellValue();
if (!StringUtils.isEmpty(cellValue) && cellValue.indexOf("}}") > -1) {
for (int fi1 = 0; fi1 < cellList.size(); fi1++) {
String str = cellList.get(fi1);
if (newCell != null && str.indexOf("}}") > -1 && cellValue.equals(str)) {
String[] field = str.substring(2, str.indexOf("}}")).split("\\.");
for (int fi = 1; fi < fiLength; fi++) {
Object pkValue = dataR.get(field[fi]);
dataR = RecordUtils.toRecord(pkValue);
}
if (dataR != null) {
Object value = valueFormat(field[fiLength], dataR);
setCellValue(newCell, value);
}
break;
}
}
}
}
private static void copyRow(org.apache.poi.ss.usermodel.Row srcRow, org.apache.poi.ss.usermodel.Row destRow) {
for (int jn = 0; jn < srcRow.getLastCellNum(); jn++) {
org.apache.poi.ss.usermodel.Cell templateCell = srcRow.getCell(jn);
if (templateCell != null) {
org.apache.poi.ss.usermodel.Cell newCell = destRow.createCell(jn);
copyCell(templateCell, newCell);
}
}
}
private static void copyCell(org.apache.poi.ss.usermodel.Cell srcCell, org.apache.poi.ss.usermodel.Cell distCell) {
distCell.setCellStyle(srcCell.getCellStyle());
if (srcCell.getCellComment() != null) {
distCell.setCellComment(srcCell.getCellComment());
}
CellType srcCellType = srcCell.getCellType();
distCell.setCellType(srcCellType);
if (srcCellType == CellType.NUMERIC) {
if (HSSFDateUtil.isCellDateFormatted(srcCell)) {
distCell.setCellValue(srcCell.getDateCellValue());
} else {
distCell.setCellValue(srcCell.getNumericCellValue());
}
} else if (srcCellType == CellType.STRING) {
distCell.setCellValue(srcCell.getRichStringCellValue());
} else if (srcCellType == CellType.BLANK) {
// nothing21
} else if (srcCellType == CellType.BOOLEAN) {
distCell.setCellValue(srcCell.getBooleanCellValue());
} else if (srcCellType == CellType.ERROR) {
distCell.setCellErrorValue(srcCell.getErrorCellValue());
} else if (srcCellType == CellType.FORMULA) {
distCell.setCellFormula(srcCell.getCellFormula());
} else { // nothing29
}
}
4.测试
public static void main(String[] args) throws Exception {
Record dataListMap = new Record();
List list = new ArrayList<>();
for (int ls = 0; ls < 1048500; ls++) {
list.add(new Record().set("name", "杨锋" + ls).set("desc", "研发部" + ls));
}
dataListMap.set("list", list);
dataListMap.set("cxsq", "测试");
writeTemp("C:\\Users\\YF\\Desktop\\temp.xlsx", "C:\\Users\\YF\\Desktop\\" + System.currentTimeMillis() + ".xlsx", dataListMap);
}
从下图可以看到,1048500行数据已经成功写入到模版,处理的时间只用了18秒,相当快,并且也没有抛出OOM异常。
最终生成的excel如下图所示:
在方法执行过程中,内存的波动不大,处理完成后内存被回收,内存保持平稳。