没错iText在打开时解析相当多的PDF(它不读取流对象的内容,但是这是关于它)…
除非你使用PdfReader(RandomAccessFileOrArray)构造函数,在这种情况下,它只会读取外部参照(大部分是必需的),但不会解析任何东西,直到你开始请求特定对象(直接或通过各种调用).
The first PDF program I ever wrote did exactly this. It opened up a PDF and doing the bare minimum amount of work necessary, read the number of pages. It didn’t even parse the xrefs it didn’t have to. Haven’t thought about that program in years…
所以虽然效率不是很高,但使用RandomAccessFileOrArray将会更有效率:
int efficientPDFPageCount(String path) {
RandomAccessFileOrArray file = new RandomAccessFileOrArray(path, false, true );
PdfReader reader = new PdfReader(file);
int ret = reader.getNumberOfPages();
reader.close();
return ret;
}
更新:
itext API经过了一番大修.现在(5.4.x版)正确的使用方法是通过java.io.RandomAccessFile:
int efficientPDFPageCount(File file) {
RandomAccessFile raf = new RandomAccessFile(file, "r");
RandomAccessFileOrArray pdfFile = new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(raf));
PdfReader reader = new PdfReader(pdfFile, new byte[0]);
int pages = reader.getNumberOfPages();
reader.close();
return pages;
}