java提取pdf中的文字demo

jwwKngiht

于 2021-05-21 15:24:05 发布

阅读量632

点赞数

分类专栏： demo

本文链接：https://blog.csdn.net/weixin_41945912/article/details/117123284

版权

demo 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

1.pom

        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.22</version>
        </dependency>

2.代码

    public static String getTextFromPDF(String pdfFilePath) throws Exception {
        RandomAccessRead accessRead = new RandomAccessFile(new File(pdfFilePath), "rw");
        PDFParser parser = new PDFParser(accessRead); // 创建PDF解析器
        parser.parse(); // 执行PDF解析过程
        PDDocument pdfdocument = parser.getPDDocument(); // 获取解析器的PDF文档对象
        PDFTextStripper pdfstripper = new PDFTextStripper(); // 生成PDF文档内容剥离器
        String contenttxt = pdfstripper.getText(pdfdocument); // 利用剥离器获取文档
        System.out.println(contenttxt);
        accessRead.close();
        pdfdocument.close();
        return contenttxt;
    }

jwwKngiht

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java提取pdf中的文字demo

1.pom <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.22</version> </dependency>2.代码 public static Stri
复制链接

扫一扫