java xml内存泄漏,Java Apache-poi,内存泄漏与excel文件

在处理大量Excel文件时,使用Apache POI遇到内存溢出问题。异常为'java.lang.OutOfMemoryError: GC overhead limit exceeded'。尝试使用OPCPackage.open()打开文件仍无法解决问题。解决方法是利用POI的SAX(事件API),在读取过程中不将整个工作簿加载到内存,而是逐行处理,降低内存占用。

I need to read (15000) excel files for my thesis. I'm using apache poi to open and later to analyze them but after around 5000 files I'm getting the following exception and stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3044)

at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3065)

at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3263)

at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)

at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822)

at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)

at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)

at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4682)

at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)

at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)

at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)

at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)

at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)

at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)

at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)

at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:92)

at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)

at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:173)

at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:165)

at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:417)

at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:382)

at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:178)

at org.apache.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:249)

at org.apache.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:302)

at de.spreadsheet_realtions.analysis.WorkbookAnalysis.analyze(WorkbookAnalysis.java:18)

Code (at the moment just open the file and close the file):

public static void main(String[] args) {

start();

}

public void start(){

File[] files = getAllFiles(Config.folder);

ZipSecureFile.setMinInflateRatio(0.00);

for(File f: files){

analyze(f);

}

}

public void analyze(File file){

Workbook workbook = null;

try {

workbook = new XSSFWorkbook(file); //line 18

} catch (Exception e1) {e1.printStackTrace(); return;}

// later would be here the code to analyze the workbook

try {

workbook.close();

} catch (Exception e) {e.printStackTrace();}

}

I tried also with OPCPackage.open(file) and I got the same result.

What I'm doing wrong or what can I do to solve this problem? Thanks for any help.

EDIT:

The same for the code below.

try (XSSFWorkbook workbook = new XSSFWorkbook(file)){

} catch (Exception e1) {e1.printStackTrace(); return;}

解决方案

Typically, POI has the whole workbook in memory. So, a large workbook requires a different approach.

While writing, one can use SXSSF and most calls are the same, except that only a certain number of rows are in memory.

In your case, you are reading. For this you can use their "event driven" API. The basic idea here is that you do not get the workbook as one huge object. Instead, you get it piecemeal, as it is read, and you can save off as much as you wish into your own data-structure. Or, you can simply process it as you read it and not save very much.

Since this is a lower-level API (driven by the structure of the data being read), there is one approach for XLS and a different approach for XLSX. Look at the POI "How To" page, and find the section titled "XSSF and SAX (Event API)".

That example demonstrates how to detect the value of each cell as it is read in.

(You'll need the xercesImpl.jar on your library path.)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值