报错：java.lang.NoSuchMethodError: ‘org.apache.poi.poifs.filesystem.DirectoryNode

落千

已于 2022-07-08 09:21:21 修改

阅读量1.1k

点赞数 1

分类专栏： elasticSearch java 文章标签： tika poi 版本冲突报错提取内容

于 2022-07-08 09:07:49 首次发布

本文链接：https://blog.csdn.net/zm_960911/article/details/125670997

版权

Tika POI 版本冲突 doc文件内容抽取

关键词由CSDN通过智能技术生成

java 同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

elasticSearch

8 篇文章 3 订阅

订阅专栏

最近做全文检索功能，使用Tika做的文件内容抽取，项目的其他位置使用了POI，Tika读取doc文件时报错提示：POI版本冲突

java.lang.NoSuchMethodError: 'org.apache.poi.poifs.filesystem.DirectoryNode

研究了一些资料

Apache Tika是一个内容抽取的工具集合(a toolkit for text extracting)。它集成了POI, Pdfbox
并且为文本抽取工作提供了一个统一的界面。其次，Tika也提供了便利的扩展API，用来丰富其对第三方文件格式的支持。

Tika提供了对如下文件格式的支持:

PDF - 通过Pdfbox、MS-* - 通过POI、HTML -使用nekohtml将不规范的html整理成为xhtml、OpenOffice 格式 - Tika提供、Archive - zip, tar, gzip, bzip等、RTF - Tika提供、Java class - Class解析由ASM完成、Image -只支持图像的元数据抽取。

因为Tika集成了POI，所以在使用的时候导致Tika和POI的版本发生了冲突，出现上面的报错。有些建议说若原工程内使用过POI工具，需要把原POI依赖先清空，避免依赖冲突。
但是我不想改变之前的代码，所以打算曲线救国，使用POI单独对doc格式的文件进行处理（因为Tika处理docx、pdf等文件时没有报错）
导入poi相关的jar包，pom.xml引入如下：

		<dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.8</version>
        </dependency>

通过poi获取word文本内容如下：

public static String parseContent(File f,String fileType){
        String content =null;
        try{
            InputStream stream = FileUtils.openInputStream(f);
             /* 判断类型是doc就使用poi，否则使用tika*/
            if (fileType.equals("doc")){
                content =parseDoc(stream);
            }
            if (content == null){
                content =parseContent(stream);
            }
        }catch (Exception e){
            System.err.println("error"+e);
        }
        return content;
    }
    
 /* poi读取file*/
public static String parseDoc(InputStream stream) throws IOException {
        try {
            HWPFDocument doc = new HWPFDocument(stream);
            /*String doc1 = doc.getDocumentText();
            System.out.println(doc1);
            StringBuilder doc2 = doc.getText();
            System.out.println(doc2);*/
            Range rang = doc.getRange();
            String doc3 = rang.text();
            System.out.println(doc3);
            return doc3;
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

 /* tika读取file*/
public static String parseContent(InputStream stream){
        String content =null;
        try{
            AutoDetectParser parser = new AutoDetectParser();
            BodyContentHandler handler = new BodyContentHandler(Integer.MAX_VALUE);
            Metadata metadata = new Metadata();
            parser.parse(stream, handler, metadata);
            content = handler.toString();
        }catch (Exception e){
            System.err.println("tika parse error"+e);
        }
        return content;
    }