使用PDFBOX提取PDF文件

最新推荐文章于 2024-05-20 21:17:29 发布

seansaint

最新推荐文章于 2024-05-20 21:17:29 发布

阅读量5.8k

点赞数 2

最近在使用lucene建立索引时，要使用PDFBOX提取PDF文件，但结果总是报错：

java.lang.Throwable: Warning: You did not close the PDF Document

这个问题很烦人，从第三方类库例出来。

在网上记录下找到的解决办法：

原来的代码：

StringBuffer content = new StringBuffer（“”）; //   内容的所述文件 
的FileInputStream FIS = 新的FileInputStream（F）;
                        PDFParser p = 新的 PDFParser（fis）;
                        p.parse（）;
                        PDFTextStripper ts = new PDFTextStripper（）;
                        content.append（ts.getText（p.getPDDocument（）））;

不报错的代码：

StringBuffer content = new StringBuffer（“”）; //   内容的所述文件 
                PDDocument pdfDocument = 空 ;
                尝试 {
                        FileInputStream fis = new FileInputStream（f）;
                        PDFTextStripper stripper = new PDFTextStripper（）;
                        pdfDocument = PDDocument.load（fis）;
                        StringWriter writer = new StringWriter（）;
                        stripper.writeText（pdfDocument，writer）;
                        content.append（writer.getBuffer（）的toString（））;
                        fis.close（）;
                } catch（java.io.IOException e）{
                        System.err.println（“IOException =” + e）;
                        System.exit（1）;
                } finally {
                         if（pdfDocument！= null）{
 //                               System.err.println（“关闭文档” + f + “...”）;
                                org.pdfbox.cos.COSDocument cos = pdfDocument.getDocument（）;
                                cos.close（）;
//                               System.err.println（“Closed” + cos）;
                                pdfDocument.close（）;
                        }
                }

seansaint

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用PDFBOX提取PDF文件

最近在使用lucene建立索引时，要使用PDFBOX提取PDF文件，但结果总是报错：java.lang.Throwable: Warning: You did not close the PDF Document这个问题很烦人，从第三方类库例出来。在网上记录下找到的解决办法：原来的代码：StringBuffer content = new StringBuffer（“”）;
复制链接

扫一扫