Lucene文件检索项目实战

1、需求分析

假设有一批文档,格式有DOC、DOCX、PPT、PPTX、TXT、PDF这几种,实现一个类似百度文库的文件检索系统,需求如下。
(1)能够对文件名进行检索。
(2)能够对文件内容进行检索。
(3)能够下载检索到的文件。
(4)能够实现关键字的高亮。
在这里插入图片描述

2、架构设计

概括如下,文件存储系统中存放了不同类型的文件,后台通过程序提取出文件名和文件内容,使用Lucene对文件名和文件内容进行索引,前端用户提供查询接口,用户提交关键字之后检索索引库,返回匹配文档至前端页面。
在这里插入图片描述

3、文件抽取

请参见https://blog.csdn.net/yangang1223/article/details/101367870,不再赘述。

4、工程搭建

请确保当前环境已安装Java、IDEA或eclipse、Tomcat
(1)新建springboot项目filesearch
(2)建好如图所示的目录
在这里插入图片描述

(3)启动项目,访问hello接口。
在这里插入图片描述

5、索引文档

工程搭建完成,首先进行索引的构建,要检索的对象是文件,为了简单,我们只索引文档名和文档内容。在domain下建实体类FileModel。

package com.cnpc.domain;

/**
 * Created by grant on 2019/9/25.
 */
public class FileEntity {
    private String title;//文件标题
    private String content;//文件内容

    public FileEntity(){}
    public FileEntity(String title, String content) {
        this.title = title;
        this.content = content;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getContent() {
        return content;
    }

    public void setContent(String content) {
        this.content = content;
    }
}

在util目录下创建CreateIndex类,用来解析files目录下的文档,提取文档内容后映射成FileEntity对象,使用IK分词器分词,创建索引。IK分词器需要一些配置,此处我们建个package存放重写的两个类,IKAnalyzer6x和IKTokenzier6x,在创建IKAnalyzer对象时就改为了IKAnalyzer6x。
并且将main2012.dic放入类路径下,不然会报Main Dictionary not found!

package com.cnpc.util;

import com.cnpc.domain.FileEntity;
import com.cnpc.ik.IKAnalyzer6x;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;

import org.xml.sax.SAXException;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

/**
 * Created by grant on 2019/9/25.
 */
public class CreateIndex {
    public static List<FileEntity> extractFile(){
        ArrayList<FileEntity> list = new ArrayList<FileEntity>();
        File fileDir = new File("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\files");
        File[] allFiles = fileDir.listFiles();
        for (File file : allFiles) {
            FileEntity fileEntity = new FileEntity(file.getName(), ParserExtraction(file));
            list.add(fileEntity);
        }
        return list;
    }

    private static String ParserExtraction(File file) {
        String fileContent = "";//接收文档内容
        BodyContentHandler handler = new BodyContentHandler();//handler
        Parser parser = new AutoDetectParser();//自动解析器接口
        Metadata metadata = new Metadata();//元数据对象
        FileInputStream inputStream;//字节流
        try {
            inputStream = new FileInputStream(file);
            ParseContext context = new ParseContext();
            parser.parse(inputStream,handler,metadata,context);
            fileContent = handler.toString();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (TikaException e) {
            e.printStackTrace();
        }
        return fileContent;
    }

    public static void main(String[] args) throws IOException {
        //IK分词器对象
        IKAnalyzer6x analyzer = new IKAnalyzer6x();
        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
        iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        Directory dir = null;
        IndexWriter inWriter = null;

        Path indexPath = Paths.get("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir");
        System.out.println("indexdir :"+indexPath);
        FieldType fieldType = new FieldType();
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        fieldType.setStored(true);
        fieldType.setTokenized(true);
        fieldType.setStoreTermVectors(true);
        fieldType.setStoreTermVectorPositions(true);
        fieldType.setStoreTermVectorOffsets(true);

        Date start = new Date();//开始时间
        if(!Files.isReadable(indexPath)){
            System.out.println(indexPath.toAbsolutePath() + "不存在或者不可读,请检查!");
            System.exit(1);
        }
        dir = FSDirectory.open(indexPath);
        inWriter = new IndexWriter(dir,iwc);

        List<FileEntity> fileList = (ArrayList<FileEntity>)extractFile();

        //遍历fileList,建立索引
        for (FileEntity fileEntity : fileList) {
            Document doc = new Document();
            doc.add(new Field("title",fileEntity.getTitle(),fieldType));
            doc.add(new Field("content",fileEntity.getContent(),fieldType));
            inWriter.addDocument(doc);
        }
        inWriter.commit();
        inWriter.close();
        dir.close();
        Date end = new Date();//结束时间
        //打印索引耗时
        System.out.println("索引文档完成,共耗时:"+(end.getTime() - start.getTime()) +"毫秒.");

    }
}


运行main方法,files下的所有文档,无论什么格式,文件名和文件内容都被写入到Lucene索引。
结果:

C:\App\Java\jdk1.8.0_92\bin\java "-javaagent:E:\BigData\software\IntelliJ IDEA 2017.1.5\lib\idea_rt.jar=59109:E:\BigData\software\IntelliJ IDEA 2017.1.5\bin" -Dfile.encoding=UTF-8 -classpath C:\App\Java\jdk1.8.0_92\jre\lib\charsets.jar;C:\App\Java\jdk1.8.0_92\jre\lib\deploy.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\access-bridge-64.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\cldrdata.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\dnsns.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\jaccess.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\jfxrt.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\localedata.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\nashorn.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunec.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunjce_provider.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunmscapi.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunpkcs11.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\zipfs.jar;C:\App\Java\jdk1.8.0_92\jre\lib\javaws.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jce.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jfr.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jfxswt.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jsse.jar;C:\App\Java\jdk1.8.0_92\jre\lib\management-agent.jar;C:\App\Java\jdk1.8.0_92\jre\lib\plugin.jar;C:\App\Java\jdk1.8.0_92\jre\lib\resources.jar;C:\App\Java\jdk1.8.0_92\jre\lib\rt.jar;D:\RF-WorkSpace\ChinaOil\newcode\filesearch\target\classes;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter\2.1.8.RELEASE\spring-boot-starter-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot\2.1.8.RELEASE\spring-boot-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-context\5.1.9.RELEASE\spring-context-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-autoconfigure\2.1.8.RELEASE\spring-boot-autoconfigure-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-logging\2.1.8.RELEASE\spring-boot-starter-logging-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\ch\qos\logback\logback-classic\1.2.3\logback-classic-1.2.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\ch\qos\logback\logback-core\1.2.3\logback-core-1.2.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\logging\log4j\log4j-to-slf4j\2.11.2\log4j-to-slf4j-2.11.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\logging\log4j\log4j-api\2.11.2\log4j-api-2.11.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\slf4j\jul-to-slf4j\1.7.28\jul-to-slf4j-1.7.28.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\annotation\javax.annotation-api\1.3.2\javax.annotation-api-1.3.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-core\5.1.9.RELEASE\spring-core-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-jcl\5.1.9.RELEASE\spring-jcl-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\yaml\snakeyaml\1.23\snakeyaml-1.23.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\slf4j\slf4j-api\1.7.28\slf4j-api-1.7.28.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-web\2.1.8.RELEASE\spring-boot-starter-web-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-json\2.1.8.RELEASE\spring-boot-starter-json-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-databind\2.9.9.3\jackson-databind-2.9.9.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-annotations\2.9.0\jackson-annotations-2.9.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\datatype\jackson-datatype-jdk8\2.9.9\jackson-datatype-jdk8-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\datatype\jackson-datatype-jsr310\2.9.9\jackson-datatype-jsr310-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\module\jackson-module-parameter-names\2.9.9\jackson-module-parameter-names-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-tomcat\2.1.8.RELEASE\spring-boot-starter-tomcat-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-core\9.0.24\tomcat-embed-core-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-el\9.0.24\tomcat-embed-el-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-websocket\9.0.24\tomcat-embed-websocket-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\hibernate\validator\hibernate-validator\6.0.17.Final\hibernate-validator-6.0.17.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\validation\validation-api\2.0.1.Final\validation-api-2.0.1.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jboss\logging\jboss-logging\3.3.3.Final\jboss-logging-3.3.3.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\classmate\1.4.0\classmate-1.4.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-web\5.1.9.RELEASE\spring-web-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-beans\5.1.9.RELEASE\spring-beans-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-webmvc\5.1.9.RELEASE\spring-webmvc-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-aop\5.1.9.RELEASE\spring-aop-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-expression\5.1.9.RELEASE\spring-expression-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-thymeleaf\2.1.8.RELEASE\spring-boot-starter-thymeleaf-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\thymeleaf-spring5\3.0.11.RELEASE\thymeleaf-spring5-3.0.11.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\thymeleaf\3.0.11.RELEASE\thymeleaf-3.0.11.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\attoparser\attoparser\2.0.5.RELEASE\attoparser-2.0.5.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\unbescape\unbescape\1.1.6.RELEASE\unbescape-1.1.6.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\extras\thymeleaf-extras-java8time\3.0.4.RELEASE\thymeleaf-extras-java8time-3.0.4.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\janeluo\ikanalyzer\2012_u6\ikanalyzer-2012_u6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-core\6.6.0\lucene-core-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-analyzers-common\6.6.0\lucene-analyzers-common-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-analyzers-smartcn\6.6.0\lucene-analyzers-smartcn-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-queryparser\6.6.0\lucene-queryparser-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-sandbox\6.6.0\lucene-sandbox-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-queries\6.6.0\lucene-queries-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-highlighter\6.6.0\lucene-highlighter-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-join\6.6.0\lucene-join-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-memory\6.6.0\lucene-memory-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-demo\6.6.0\lucene-demo-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-expressions\6.6.0\lucene-expressions-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-facet\6.6.0\lucene-facet-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\servlet\servlet-api\2.4\servlet-api-2.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\antlr\antlr4-runtime\4.5.1-1\antlr4-runtime-4.5.1-1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ow2\asm\asm\5.1\asm-5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ow2\asm\asm-commons\5.1\asm-commons-5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tika\tika-core\1.13\tika-core-1.13.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tika\tika-parsers\1.13\tika-parsers-1.13.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\gagravarr\vorbis-java-tika\0.8\vorbis-java-tika-0.8.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\healthmarketscience\jackcess\jackcess\2.1.3\jackcess-2.1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\healthmarketscience\jackcess\jackcess-encrypt\2.1.1\jackcess-encrypt-2.1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sourceforge\jmatio\jmatio\1.0\jmatio-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\james\apache-mime4j-core\0.7.2\apache-mime4j-core-0.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\james\apache-mime4j-dom\0.7.2\apache-mime4j-dom-0.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-compress\1.11\commons-compress-1.11.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\tukaani\xz\1.5\xz-1.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox\2.0.1\pdfbox-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\fontbox\2.0.1\fontbox-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox-tools\2.0.1\pdfbox-tools-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox-debugger\2.0.1\pdfbox-debugger-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\jempbox\1.8.12\jempbox-1.8.12.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcmail-jdk15on\1.54\bcmail-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcpkix-jdk15on\1.54\bcpkix-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcprov-jdk15on\1.54\bcprov-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi\3.15-beta1\poi-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-scratchpad\3.15-beta1\poi-scratchpad-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-ooxml\3.15-beta1\poi-ooxml-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-ooxml-schemas\3.15-beta1\poi-ooxml-schemas-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\xmlbeans\xmlbeans\2.6.0\xmlbeans-2.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\github\virtuald\curvesapi\1.03\curvesapi-1.03.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ccil\cowan\tagsoup\tagsoup\1.2.1\tagsoup-1.2.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\mp4parser\isoparser\1.1.18\isoparser-1.1.18.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\drewnoakes\metadata-extractor\2.8.1\metadata-extractor-2.8.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\adobe\xmp\xmpcore\5.1.2\xmpcore-5.1.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\de\l3s\boilerpipe\boilerpipe\1.1.0\boilerpipe-1.1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\rometools\rome\1.5.1\rome-1.5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\rometools\rome-utils\1.5.1\rome-utils-1.5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\gagravarr\vorbis-java-core\0.8\vorbis-java-core-0.8.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\juniversalchardet\juniversalchardet\1.0.3\juniversalchardet-1.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codelibs\jhighlight\1.0.2\jhighlight-1.0.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\pff\java-libpst\0.8.1\java-libpst-0.8.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\github\junrar\junrar\0.7\junrar-0.7.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-logging\commons-logging-api\1.1\commons-logging-api-1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-vfs2\2.0\commons-vfs2-2.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-api\1.4\maven-scm-api-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\plexus\plexus-utils\1.5.6\plexus-utils-1.5.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-provider-svnexe\1.4\maven-scm-provider-svnexe-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-provider-svn-commons\1.4\maven-scm-provider-svn-commons-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\regexp\regexp\1.3\regexp-1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-rs-client\3.0.3\cxf-rt-rs-client-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-transports-http\3.0.3\cxf-rt-transports-http-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-core\3.0.3\cxf-core-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\woodstox\woodstox-core-asl\4.4.1\woodstox-core-asl-4.4.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\ws\xmlschema\xmlschema-core\2.1.0\xmlschema-core-2.1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-frontend-jaxrs\3.0.3\cxf-rt-frontend-jaxrs-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\opennlp\opennlp-tools\1.5.3\opennlp-tools-1.5.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\opennlp\opennlp-maxent\3.0.3\opennlp-maxent-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sf\jwordnet\jwnl\1.3.3\jwnl-1.3.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-exec\1.3\commons-exec-1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\json-simple\json-simple\1.1.1\json-simple-1.1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\json\json\20140107\json-20140107.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\code\gson\gson\2.8.5\gson-2.8.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\netcdf4\4.5.5\netcdf4-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\jcip\jcip-annotations\1.0\jcip-annotations-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\java\dev\jna\jna\4.5.2\jna-4.5.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\grib\4.5.5\grib-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jdom\jdom2\2.0.6\jdom2-2.0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jsoup\jsoup\1.7.2\jsoup-1.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\jj2000\5.2\jj2000-5.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\itadaki\bzip2\0.9.1\bzip2-0.9.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\cdm\4.5.5\cdm-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\udunits\4.5.5\udunits-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpcore\4.4.12\httpcore-4.4.12.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\joda-time\joda-time\2.10.3\joda-time-2.10.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\quartz-scheduler\quartz\2.3.1\quartz-2.3.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\mchange\mchange-commons-java\0.2.15\mchange-commons-java-0.2.15.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sf\ehcache\ehcache-core\2.6.2\ehcache-core-2.6.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\guava\guava\17.0\guava-17.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\beust\jcommander\1.35\jcommander-1.35.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\httpservices\4.5.5\httpservices-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpclient\4.5.9\httpclient-4.5.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpmime\4.5.9\httpmime-4.5.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-csv\1.0\commons-csv-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-utility\0.6\sis-utility-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\storage\sis-netcdf\0.6\sis-netcdf-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\storage\sis-storage\0.6\sis-storage-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-referencing\0.6\sis-referencing-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-metadata\0.6\sis-metadata-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\opengis\geoapi\3.0.0\geoapi-3.0.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\measure\jsr-275\0.9.3\jsr-275-0.9.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-core\2.9.9\jackson-core-2.9.9.jar com.cnpc.util.CreateIndex
15:33:22.672 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
索引文档完成,共耗时:5511毫秒.

Process finished with exit code 0

查看索引目录:
在这里插入图片描述

6、查询界面

简而言之,查询界面就是后台接收用户搜索的关键词。
在controller目录下建立SearcherController.java。

import com.cnpc.domain.FileEntity;
import com.cnpc.domain.Msg;
import com.cnpc.service.SearchService;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.ModelAndView;


import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.UnsupportedEncodingException;
import java.util.List;

/**
 * Created by grant on 2019/9/27.
 */
@RestController
public class SearchController{
    private final static org.slf4j.Logger logger= LoggerFactory.getLogger(SearchController.class);

    @Autowired
    private SearchService searchService;

    /**
     * 初始化界面入口
     * @return
     */
    @RequestMapping(value = "init",method = RequestMethod.GET)
    public ModelAndView init(){
        return new ModelAndView("index");
    }

    /**
     * 处理前台传入查询关键字,返回查询结果。
     * @param request
     * @param response
     * @return
     * @throws UnsupportedEncodingException
     */
    @RequestMapping(value = "SearchFile",method = RequestMethod.GET)
    public Msg searchFile(HttpServletRequest request, HttpServletResponse response) throws UnsupportedEncodingException {

        String indexpathStr = "D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir";
        //接受查询字符串
        System.out.println("indexpathStr : "+indexpathStr);

        String query = request.getParameter("query");
        System.out.println("查询字符串为: "+query);
        //编码格式转化
       // query = new String(query.getBytes("iso8859-1"),"UTF-8");
        if(query.equals("") || null == query){
            System.out.println("参数错误");
            return Msg.fail();
        }else {
            List<FileEntity> hitsList = searchService.SearchFile(query, indexpathStr, 100);
            if(null == hitsList){
                System.out.println("无匹配的记录");
                return Msg.fail();
            }
            System.out.println("共搜到:"+hitsList.size()+" 条记录");

            return Msg.success().add("resultList",hitsList);
        }
    }
}

在service下建立SearchService接口,创建查询接口。

import com.cnpc.domain.FileEntity;
import org.springframework.stereotype.Service;

import java.util.List;

/**
 * Created by grant on 2019/9/27.
 */
@Service
public interface SearchService{
    public List<FileEntity> SearchFile(String key,String indexpathStr,int n);
}

在service目录下创建impl目录,并在其目录下创建SearchServiceImpl.java,写查询的实现代码。

import com.cnpc.domain.FileEntity;
import com.cnpc.ik.IKAnalyzer6x;
import com.cnpc.service.SearchService;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;

import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Created by grant on 2019/9/27.
 */
@Service
public class SearchServiceImp implements SearchService {

    @Override
    public List<FileEntity> SearchFile(String key, String indexpathStr, int n) {
        ArrayList<FileEntity> histList = new ArrayList<FileEntity>();
        //检索域
        String[] fields = {"title","content"};
        Path indexPath = Paths.get(indexpathStr);
        Directory dir;
        try {
            //第一步,查询准备工作,创建Directory对象
            dir = FSDirectory.open(indexPath);
            DirectoryReader reader = DirectoryReader.open(dir);

            //创建IndexReader对象
            //创建IndexSearch对象
            IndexSearcher searcher = new IndexSearcher(reader);
            IKAnalyzer6x analyzer = new IKAnalyzer6x();
            //第二步,闯将查询条件对象
            MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields, analyzer);

            //查询字符串
            System.out.println("keyword: "+key);
            Query query = queryParser.parse(key);
            TopDocs topDocs = searcher.search(query, n);
            if(topDocs.totalHits == 0){
                return null;
            }

            //定制高亮标签
            SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span style= \"color:red;\">", "</span>");
            QueryScorer scoreTitle = new QueryScorer(query,fields[0]);
            Highlighter highlighterTitle = new Highlighter(formatter, scoreTitle);

            QueryScorer scoreContent = new QueryScorer(query, fields[1]);
            Highlighter highlighterContent = new Highlighter(formatter, scoreContent);

            TopDocs hits = searcher.search(query, 100);

            for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
                Document doc = searcher.doc(scoreDoc.doc);

                String title = doc.get("title");
                String content = doc.get("content");
                //获取tokenstream
                TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
                        scoreDoc.doc, fields[0], new IKAnalyzer6x());
                SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(scoreTitle);
                highlighterTitle.setTextFragmenter(fragmenter);
                String hl_title = highlighterContent.getBestFragment(tokenStream, title);
                //获取高亮的片段,可以对其数量进行限制
                tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), scoreDoc.doc, fields[1], new IKAnalyzer6x());
                fragmenter = new SimpleSpanFragmenter(scoreContent);
                highlighterContent.setTextFragmenter(fragmenter);
                //获取高亮的片段,可以对其数量进行限制
                String hl_content = highlighterContent.getBestFragment(tokenStream, content);
                FileEntity fileEntity = new FileEntity(hl_title != null ? hl_title : title, hl_content != null ? hl_content : content);
                histList.add(fileEntity);
            }
                dir.close();
                reader.close();
        }catch (IOException e){
            e.printStackTrace();
        }catch (ParseException e){
            e.printStackTrace();
        }catch (InvalidTokenOffsetsException e){
            e.printStackTrace();
        }
        return histList;
    }

}

在resource下面的application.properties中添加:

server.port=8888

在files下面放入5个有关大数据学习的文件,有doc、docx、pdf、txt格式,以供程序解析抽取内容。
在这里插入图片描述
在static下面导入boostrap和jquery文件,在templates下面放入查询的主页面,此处为index.html,本文从boostrap官网示例中拿取示例页面稍作修改。
在这里插入图片描述
index.html代码(已使用ajax做了数据绑定)

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- 上述3个meta标签*必须*放在最前面,任何其他内容都*必须*跟随其后! -->
    <meta name="description" content="">
    <meta name="author" content="">
    <link rel="icon" href="../../favicon.ico">

    <title>文档搜索</title>

    <!-- Bootstrap core CSS -->
    <link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">

    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
    <link href="../../assets/css/ie10-viewport-bug-workaround.css" rel="stylesheet">

    <!-- Custom styles for this template -->
    <link href="jumbotron.css" rel="stylesheet">

    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
    <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
    <script src="../../assets/js/ie-emulation-modes-warning.js"></script>

    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!--[if lt IE 9]>
    <script src="https://cdn.bootcss.com/html5shiv/3.7.3/html5shiv.min.js"></script>
    <script src="https://cdn.bootcss.com/respond.js/1.4.2/respond.min.js"></script>


    <![endif]-->


    <!-- Bootstrap core JavaScript
    ================================================== -->
    <!-- Placed at the end of the document so the pages load faster -->
    <script src="https://cdn.bootcss.com/jquery/1.12.4/jquery.min.js"></script>
    <script>window.jQuery || document.write('<script src="../../assets/js/vendor/jquery.min.js"><\/script>')</script>
    <script src="https://cdn.bootcss.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>

</head>

<body>

<nav class="navbar navbar-inverse navbar-fixed-top">
    <div class="container">
        <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
                <span class="sr-only">Toggle navigation</span>
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
            </button>
            <a class="navbar-brand" href="#">基于Lucene的文件搜索系统</a>
        </div>
        <div id="navbar" class="navbar-collapse collapse">
            <form class="navbar-form navbar-right"><!-- action="SearchFile" method="get"-->
                <div class="form-group">
                    <input type="text" placeholder="keyword" class="form-control" name="query" id="query">
                </div>
               <!-- <div class="form-group">
                    <input type="password" placeholder="Password" class="form-control">
                </div>-->
                <button type="button" class="btn btn-success" id="search_btn">检索</button>
            </form>
        </div><!--/.navbar-collapse -->
    </div>
</nav>

<div class="jumbotron">
</div>
<div class="container">
    <!--标题行  -->
    <div class="row">
        <div class="col-md-12">
            <h1 id="h1-result">搜索结果</h1>
        </div>
    </div>
    <!--按钮  -->
    <div class="row">
        <div class="col-md-4 col-md-offset-8">
            <!--<button class="btn btn-primary" id="emp_add_modal_btn">新增</button>
            <button class="btn btn-danger" id="emp_delete_all_btn">批量删除</button>-->
        </div>
    </div>
    <!--显示表格数据  -->
    <div class="row">
        <div class="col-md-12">
            <table class="table table-hover" id="result_table">
                <thead>
                <tr>
                    <th>
                        <input type="checkbox" id="check_all"/>
                    </th>
                    <th>文件名</th>
                    <th>文件内容</th>
                    <th>操作</th>
                </tr>
                </thead>
                <tbody>

                </tbody>

            </table>
        </div>
    </div>
    <!--分页 -->
    <div class="row">
        <div class="col-md-6" id="page_info_area">
        </div>
        <div class="col-md-6" id="page_nav_area">
        </div>
    </div>
</div>



</div> <!-- /container -->

<!-- Main jumbotron for a primary marketing message or call to action -- <hr>>

<!--<footer>
    <p>&copy; 2019 Company, Inc.</p>
</footer>-->


<script type="text/javascript">
    $(function(){
        //一进来执行的代码
        search();
    });
   function search(){
       //点击搜索按钮
       $("#search_btn").click(function(){
           var query = $("#query").val();
           $.ajax({
               url:"/SearchFile",
               data:"query="+query,
               type:"GET",
               success:function(result){
                   //1.解析并显示搜索结果
                   build_result_table(result);
               }
           });
       });
   }
    function build_result_table(result){
        //先清空数据
        $("#result_table tbody").empty();
       if(result.code == 200){
           $("#h1-result").html("无匹配记录");
       }else{
            $("#h1-result").html("搜索结果");
            var result=result.extend.resultList;
            $.each(result,function(index,item){
                var checkBoxTd = $("<td><input type ='checkbox' class='check_item'/></td>");
                var titleTd=$("<td></td>").append(item.title);
                var contentTd=$("<td></td>").append(item.content.length > 180 ? item.content.substring(0,80) +".....":item.content);
                var editBtn=$("<button></button>").addClass("btn btn-primary btn-sm edit_btn")
                    .append($("<span></span>").addClass("glyphicon glyphicon-download-alt")).append("下载文件");
                editBtn.attr("fileName",item.title);
                $("<tr></tr>")
                    .append(checkBoxTd)
                    .append(titleTd)
                    .append(contentTd)
                    .append(editBtn)
                    .appendTo("#result_table tbody");
            });
       }
    }
    $(document).on("click",".edit_btn",function(){
        var fileName = $(this).attr("fileName");
        //下载文件的前端需使用window.open()。
       /* $.ajax({
            url:"/FileDownload",
            data:"fileName="+fileName,
            type:"GET"/!*,
            success:function(result){
                alert(result);
            }*!/
        });*/
        window.open("http://" + window.location.host + "/FileDownload?fileName=" + fileName, '_blank')
    });
</script>
</body>
</html>

7、文件检索

启动springboot程序,并访问localhost:8888,结果如下图。
在这里插入图片描述
键入搜索关键字:spark,结果如下
在这里插入图片描述

8、文件下载

搜索出结果后需要下载文件,在controller下面创建FileDownloadController.java,通过前台点击下载文件按钮,传入文件名给后台,后台从目录下检索文件路径,读入字节流,刷新到用户磁盘。

package com.cnpc.controller;

import com.cnpc.domain.FileEntity;
import com.cnpc.domain.Msg;
import com.cnpc.service.SearchService;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.ModelAndView;

import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.*;
import java.net.URLEncoder;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Created by grant on 2019/9/27.
 */
@RestController
public class FileDownloadController {
    private final static org.slf4j.Logger logger= LoggerFactory.getLogger(FileDownloadController.class);

    @Autowired
    private SearchService searchService;
    @RequestMapping(value = "FileDownload",method = RequestMethod.GET)
    @ResponseBody
    public Msg searchFile(HttpServletRequest request, HttpServletResponse response) throws IOException {

        String indexpathStr = "D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir";
        //接受查询字符串

        String fileName =  delHtmlTag(request.getParameter("fileName"));
        System.out.println("查询字符串为: "+fileName);
        //编码格式转化,正常前台传入汉字先编码为iso8859-1,然后后台通过下面代码解码为UTF-8,
        // 但这里从前台传入的汉字没有乱码,不知道为什么?
        //fileName = new String(fileName.getBytes("iso8859-1"),"UTF-8");
        if(fileName.equals("") || null == fileName){
            System.out.println("参数错误");
            return Msg.fail();
        }else {
            File file = new File("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\files\\"+fileName);
            System.out.println(file.getPath());
         /*   //设置response的编码方式
            response.setContentType("application/octet-stream");
            //写明要下载的文件大小
            response.setContentLength((int)file.length());
            //解决中文乱码问题,向客户端发送返回页面的头信息
            //1、Content-disposition是MIME协议的扩展
            //2、attachment 作为附件下载
            //3、在客户端弹出下载狂
            //4、这个是下载文档的关键代码
            response.setHeader("Content-Disposition","attachment;" +
                    "fileName="+new String(fileName.getBytes("UTF-8"),"iso8859-1"));
            //读出I/O流
            FileInputStream fis = new FileInputStream(file);
            BufferedInputStream buff = new BufferedInputStream(fis);
            byte[] b = new byte[1024];//相当于缓存
            int k = 0;//该值用于计算当前实际下载了多少字节
            //从response对象中得到输出流,准备下载
            ServletOutputStream myout = response.getOutputStream();
            //开始循环下载
            while (-1 != (k = fis.read(b,0,b.length))){
                //将b中数据写入客户端的内存
                myout.write(b,0,k);
            }
            //将写入到客户端内存的数据,刷新到磁盘
            myout.flush();
            fis.close();
            buff.close();
            System.out.println("下载成功");*/
            // 如果文件名存在,则进行下载
            if (file.exists()) {

                // 配置文件下载,设置response的编码方式
                response.setHeader("content-type", "application/octet-stream");
                response.setContentType("application/octet-stream");
                // 下载文件能正常显示中文, 设置扩展头,当Content-Type 的类型为要下载的类型时 , 这个信息头会告诉浏览器这个文件的名字和类型。
                response.setHeader("Content-Disposition", "attachment;filename=" + URLEncoder.encode(fileName, "UTF-8"));

                // 实现文件下载
                byte[] buffer = new byte[1024];//相当于缓存
                FileInputStream fis = null;
                BufferedInputStream bis = null;
                try {
                    //读出I/O流
                    fis = new FileInputStream(file);
                    bis = new BufferedInputStream(fis);
                    OutputStream os = response.getOutputStream();
                    int i = bis.read(buffer);//该值用于计算当前实际下载了多少字节
                    while (i != -1) {
                        //将buffer中数据写入客户端的内存
                        os.write(buffer, 0, i);
                        i = bis.read(buffer);
                    }
                    System.out.println("Download the song successfully!");
                }
                catch (Exception e) {
                    System.out.println("Download the song failed!");
                }
                finally {
                    if (bis != null) {
                        try {
                            bis.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                    if (fis != null) {
                        try {
                            fis.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }
            }
        }
        return null;
    }
    /**
     * 由于解析抽取后的文档带有html标签,该方法将传过来的文件名的html标签去掉
     "resultList": [{
     "title": "<span style= \"color:red;\">Spark</span> SQL.pdf",
     "content": "\n<span style= \"color:red;\">Spark</span> SQL\n\n一起进步!\n\n\n\nShark\n\n•        Shark是基于<span style= \"color:red;\">Spark</span>计算框架之上且兼容Hive语法的SQL执行引擎,由\n\n于底层的计算采用了<span style= \"color:red;\">Spark</span>"
     }]
     * @param line
     * @return
     */
    public String delHtmlTag(String line){
        String regEx_html = "<[^>]+>";
        //创建Pattern对象
        Pattern pattern = Pattern.compile(regEx_html);
        //创建matcher对象
        Matcher matcher = pattern.matcher(line);
        line= matcher.replaceAll("");
        return line;
    }
}

重启项目,前台点击下载按钮,测试下载。
在这里插入图片描述
至此,使用Lucene文件检索项目实战的示例完成。

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值