pdfbox java.lang.outofmemoryerror,PDFBox:使用非常大的PDF。

I am working with some very large PDFs, some over 7GB in size. The PDFs have up to 20,000 pages and many full page color images. I'd like to use PDFBox to work with the PDFs, but due to the size I get OutOfMemoryError's when I attempt to open the PDFs.

I'm working with version pdfbox-app-1.6.0, on Windows 7 using Intellij, java 6.

First I tried writing a simple program that just opened the PDF in a PDDocument and coping each page over to another PDDocument: http://ideone.com/arKhB

Next I tried using the PDFBox CopyDoc example.

Both example run out of memory.

I'm assuming this is because PDFBox is trying to read the whole document into memory. Is there a way to have it only open 1 page at a time? I know it would be slower processing, but at the moment I can't process anything.

解决方案

In the 2.0.* versions, open the PDF like this:

PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly());

This will setup buffering memory usage to only use temporary file(s) (no main-memory) with not restricted size.

Update 17.4.2018: More tricks to save memory are described in the FAQ. Not yet described but active since 2.0.9 is subsampling (skip pixel lines/rows) with PDFRenderer.setSubsamplingAllowed(true) when doing rendering. This saves space for PDF files with huge image files.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值