1、需求分析
假设有一批文档,格式有DOC、DOCX、PPT、PPTX、TXT、PDF这几种,实现一个类似百度文库的文件检索系统,需求如下。
(1)能够对文件名进行检索。
(2)能够对文件内容进行检索。
(3)能够下载检索到的文件。
(4)能够实现关键字的高亮。
2、架构设计
概括如下,文件存储系统中存放了不同类型的文件,后台通过程序提取出文件名和文件内容,使用Lucene对文件名和文件内容进行索引,前端用户提供查询接口,用户提交关键字之后检索索引库,返回匹配文档至前端页面。
3、文件抽取
请参见https://blog.csdn.net/yangang1223/article/details/101367870,不再赘述。
4、工程搭建
请确保当前环境已安装Java、IDEA或eclipse、Tomcat
(1)新建springboot项目filesearch
(2)建好如图所示的目录
(3)启动项目,访问hello接口。
5、索引文档
工程搭建完成,首先进行索引的构建,要检索的对象是文件,为了简单,我们只索引文档名和文档内容。在domain下建实体类FileModel。
package com.cnpc.domain;
/**
* Created by grant on 2019/9/25.
*/
public class FileEntity {
private String title;//文件标题
private String content;//文件内容
public FileEntity(){}
public FileEntity(String title, String content) {
this.title = title;
this.content = content;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
}
在util目录下创建CreateIndex类,用来解析files目录下的文档,提取文档内容后映射成FileEntity对象,使用IK分词器分词,创建索引。IK分词器需要一些配置,此处我们建个package存放重写的两个类,IKAnalyzer6x和IKTokenzier6x,在创建IKAnalyzer对象时就改为了IKAnalyzer6x。
并且将main2012.dic放入类路径下,不然会报Main Dictionary not found!。
package com.cnpc.util;
import com.cnpc.domain.FileEntity;
import com.cnpc.ik.IKAnalyzer6x;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
/**
* Created by grant on 2019/9/25.
*/
public class CreateIndex {
public static List<FileEntity> extractFile(){
ArrayList<FileEntity> list = new ArrayList<FileEntity>();
File fileDir = new File("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\files");
File[] allFiles = fileDir.listFiles();
for (File file : allFiles) {
FileEntity fileEntity = new FileEntity(file.getName(), ParserExtraction(file));
list.add(fileEntity);
}
return list;
}
private static String ParserExtraction(File file) {
String fileContent = "";//接收文档内容
BodyContentHandler handler = new BodyContentHandler();//handler
Parser parser = new AutoDetectParser();//自动解析器接口
Metadata metadata = new Metadata();//元数据对象
FileInputStream inputStream;//字节流
try {
inputStream = new FileInputStream(file);
ParseContext context = new ParseContext();
parser.parse(inputStream,handler,metadata,context);
fileContent = handler.toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
return fileContent;
}
public static void main(String[] args) throws IOException {
//IK分词器对象
IKAnalyzer6x analyzer = new IKAnalyzer6x();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
Directory dir = null;
IndexWriter inWriter = null;
Path indexPath = Paths.get("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir");
System.out.println("indexdir :"+indexPath);
FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStored(true);
fieldType.setTokenized(true);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorOffsets(true);
Date start = new Date();//开始时间
if(!Files.isReadable(indexPath)){
System.out.println(indexPath.toAbsolutePath() + "不存在或者不可读,请检查!");
System.exit(1);
}
dir = FSDirectory.open(indexPath);
inWriter = new IndexWriter(dir,iwc);
List<FileEntity> fileList = (ArrayList<FileEntity>)extractFile();
//遍历fileList,建立索引
for (FileEntity fileEntity : fileList) {
Document doc = new Document();
doc.add(new Field("title",fileEntity.getTitle(),fieldType));
doc.add(new Field("content",fileEntity.getContent(),fieldType));
inWriter.addDocument(doc);
}
inWriter.commit();
inWriter.close();
dir.close();
Date end = new Date();//结束时间
//打印索引耗时
System.out.println("索引文档完成,共耗时:"+(end.getTime() - start.getTime()) +"毫秒.");
}
}
运行main方法,files下的所有文档,无论什么格式,文件名和文件内容都被写入到Lucene索引。
结果:
C:\App\Java\jdk1.8.0_92\bin\java "-javaagent:E:\BigData\software\IntelliJ IDEA 2017.1.5\lib\idea_rt.jar=59109:E:\BigData\software\IntelliJ IDEA 2017.1.5\bin" -Dfile.encoding=UTF-8 -classpath C:\App\Java\jdk1.8.0_92\jre\lib\charsets.jar;C:\App\Java\jdk1.8.0_92\jre\lib\deploy.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\access-bridge-64.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\cldrdata.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\dnsns.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\jaccess.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\jfxrt.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\localedata.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\nashorn.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunec.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunjce_provider.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunmscapi.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\sunpkcs11.jar;C:\App\Java\jdk1.8.0_92\jre\lib\ext\zipfs.jar;C:\App\Java\jdk1.8.0_92\jre\lib\javaws.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jce.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jfr.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jfxswt.jar;C:\App\Java\jdk1.8.0_92\jre\lib\jsse.jar;C:\App\Java\jdk1.8.0_92\jre\lib\management-agent.jar;C:\App\Java\jdk1.8.0_92\jre\lib\plugin.jar;C:\App\Java\jdk1.8.0_92\jre\lib\resources.jar;C:\App\Java\jdk1.8.0_92\jre\lib\rt.jar;D:\RF-WorkSpace\ChinaOil\newcode\filesearch\target\classes;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter\2.1.8.RELEASE\spring-boot-starter-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot\2.1.8.RELEASE\spring-boot-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-context\5.1.9.RELEASE\spring-context-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-autoconfigure\2.1.8.RELEASE\spring-boot-autoconfigure-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-logging\2.1.8.RELEASE\spring-boot-starter-logging-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\ch\qos\logback\logback-classic\1.2.3\logback-classic-1.2.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\ch\qos\logback\logback-core\1.2.3\logback-core-1.2.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\logging\log4j\log4j-to-slf4j\2.11.2\log4j-to-slf4j-2.11.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\logging\log4j\log4j-api\2.11.2\log4j-api-2.11.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\slf4j\jul-to-slf4j\1.7.28\jul-to-slf4j-1.7.28.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\annotation\javax.annotation-api\1.3.2\javax.annotation-api-1.3.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-core\5.1.9.RELEASE\spring-core-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-jcl\5.1.9.RELEASE\spring-jcl-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\yaml\snakeyaml\1.23\snakeyaml-1.23.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\slf4j\slf4j-api\1.7.28\slf4j-api-1.7.28.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-web\2.1.8.RELEASE\spring-boot-starter-web-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-json\2.1.8.RELEASE\spring-boot-starter-json-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-databind\2.9.9.3\jackson-databind-2.9.9.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-annotations\2.9.0\jackson-annotations-2.9.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\datatype\jackson-datatype-jdk8\2.9.9\jackson-datatype-jdk8-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\datatype\jackson-datatype-jsr310\2.9.9\jackson-datatype-jsr310-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\module\jackson-module-parameter-names\2.9.9\jackson-module-parameter-names-2.9.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-tomcat\2.1.8.RELEASE\spring-boot-starter-tomcat-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-core\9.0.24\tomcat-embed-core-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-el\9.0.24\tomcat-embed-el-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tomcat\embed\tomcat-embed-websocket\9.0.24\tomcat-embed-websocket-9.0.24.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\hibernate\validator\hibernate-validator\6.0.17.Final\hibernate-validator-6.0.17.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\validation\validation-api\2.0.1.Final\validation-api-2.0.1.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jboss\logging\jboss-logging\3.3.3.Final\jboss-logging-3.3.3.Final.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\classmate\1.4.0\classmate-1.4.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-web\5.1.9.RELEASE\spring-web-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-beans\5.1.9.RELEASE\spring-beans-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-webmvc\5.1.9.RELEASE\spring-webmvc-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-aop\5.1.9.RELEASE\spring-aop-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\spring-expression\5.1.9.RELEASE\spring-expression-5.1.9.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\springframework\boot\spring-boot-starter-thymeleaf\2.1.8.RELEASE\spring-boot-starter-thymeleaf-2.1.8.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\thymeleaf-spring5\3.0.11.RELEASE\thymeleaf-spring5-3.0.11.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\thymeleaf\3.0.11.RELEASE\thymeleaf-3.0.11.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\attoparser\attoparser\2.0.5.RELEASE\attoparser-2.0.5.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\unbescape\unbescape\1.1.6.RELEASE\unbescape-1.1.6.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\thymeleaf\extras\thymeleaf-extras-java8time\3.0.4.RELEASE\thymeleaf-extras-java8time-3.0.4.RELEASE.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\janeluo\ikanalyzer\2012_u6\ikanalyzer-2012_u6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-core\6.6.0\lucene-core-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-analyzers-common\6.6.0\lucene-analyzers-common-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-analyzers-smartcn\6.6.0\lucene-analyzers-smartcn-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-queryparser\6.6.0\lucene-queryparser-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-sandbox\6.6.0\lucene-sandbox-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-queries\6.6.0\lucene-queries-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-highlighter\6.6.0\lucene-highlighter-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-join\6.6.0\lucene-join-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-memory\6.6.0\lucene-memory-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-demo\6.6.0\lucene-demo-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-expressions\6.6.0\lucene-expressions-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\lucene\lucene-facet\6.6.0\lucene-facet-6.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\servlet\servlet-api\2.4\servlet-api-2.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\antlr\antlr4-runtime\4.5.1-1\antlr4-runtime-4.5.1-1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ow2\asm\asm\5.1\asm-5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ow2\asm\asm-commons\5.1\asm-commons-5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tika\tika-core\1.13\tika-core-1.13.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\tika\tika-parsers\1.13\tika-parsers-1.13.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\gagravarr\vorbis-java-tika\0.8\vorbis-java-tika-0.8.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\healthmarketscience\jackcess\jackcess\2.1.3\jackcess-2.1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\healthmarketscience\jackcess\jackcess-encrypt\2.1.1\jackcess-encrypt-2.1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sourceforge\jmatio\jmatio\1.0\jmatio-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\james\apache-mime4j-core\0.7.2\apache-mime4j-core-0.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\james\apache-mime4j-dom\0.7.2\apache-mime4j-dom-0.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-compress\1.11\commons-compress-1.11.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\tukaani\xz\1.5\xz-1.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox\2.0.1\pdfbox-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\fontbox\2.0.1\fontbox-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox-tools\2.0.1\pdfbox-tools-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\pdfbox-debugger\2.0.1\pdfbox-debugger-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\pdfbox\jempbox\1.8.12\jempbox-1.8.12.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcmail-jdk15on\1.54\bcmail-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcpkix-jdk15on\1.54\bcpkix-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\bouncycastle\bcprov-jdk15on\1.54\bcprov-jdk15on-1.54.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi\3.15-beta1\poi-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-scratchpad\3.15-beta1\poi-scratchpad-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-ooxml\3.15-beta1\poi-ooxml-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\poi\poi-ooxml-schemas\3.15-beta1\poi-ooxml-schemas-3.15-beta1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\xmlbeans\xmlbeans\2.6.0\xmlbeans-2.6.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\github\virtuald\curvesapi\1.03\curvesapi-1.03.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\ccil\cowan\tagsoup\tagsoup\1.2.1\tagsoup-1.2.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\mp4parser\isoparser\1.1.18\isoparser-1.1.18.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\drewnoakes\metadata-extractor\2.8.1\metadata-extractor-2.8.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\adobe\xmp\xmpcore\5.1.2\xmpcore-5.1.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\de\l3s\boilerpipe\boilerpipe\1.1.0\boilerpipe-1.1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\rometools\rome\1.5.1\rome-1.5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\rometools\rome-utils\1.5.1\rome-utils-1.5.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\gagravarr\vorbis-java-core\0.8\vorbis-java-core-0.8.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\juniversalchardet\juniversalchardet\1.0.3\juniversalchardet-1.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codelibs\jhighlight\1.0.2\jhighlight-1.0.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\pff\java-libpst\0.8.1\java-libpst-0.8.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\github\junrar\junrar\0.7\junrar-0.7.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-logging\commons-logging-api\1.1\commons-logging-api-1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-vfs2\2.0\commons-vfs2-2.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-api\1.4\maven-scm-api-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\plexus\plexus-utils\1.5.6\plexus-utils-1.5.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-provider-svnexe\1.4\maven-scm-provider-svnexe-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\maven\scm\maven-scm-provider-svn-commons\1.4\maven-scm-provider-svn-commons-1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\regexp\regexp\1.3\regexp-1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-rs-client\3.0.3\cxf-rt-rs-client-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-transports-http\3.0.3\cxf-rt-transports-http-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-core\3.0.3\cxf-core-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\woodstox\woodstox-core-asl\4.4.1\woodstox-core-asl-4.4.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\ws\xmlschema\xmlschema-core\2.1.0\xmlschema-core-2.1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\cxf\cxf-rt-frontend-jaxrs\3.0.3\cxf-rt-frontend-jaxrs-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\opennlp\opennlp-tools\1.5.3\opennlp-tools-1.5.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\opennlp\opennlp-maxent\3.0.3\opennlp-maxent-3.0.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sf\jwordnet\jwnl\1.3.3\jwnl-1.3.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-exec\1.3\commons-exec-1.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\googlecode\json-simple\json-simple\1.1.1\json-simple-1.1.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\json\json\20140107\json-20140107.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\code\gson\gson\2.8.5\gson-2.8.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\netcdf4\4.5.5\netcdf4-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\jcip\jcip-annotations\1.0\jcip-annotations-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\java\dev\jna\jna\4.5.2\jna-4.5.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\grib\4.5.5\grib-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jdom\jdom2\2.0.6\jdom2-2.0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\jsoup\jsoup\1.7.2\jsoup-1.7.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\jj2000\5.2\jj2000-5.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\itadaki\bzip2\0.9.1\bzip2-0.9.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\cdm\4.5.5\cdm-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\udunits\4.5.5\udunits-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpcore\4.4.12\httpcore-4.4.12.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\joda-time\joda-time\2.10.3\joda-time-2.10.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\quartz-scheduler\quartz\2.3.1\quartz-2.3.1.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\mchange\mchange-commons-java\0.2.15\mchange-commons-java-0.2.15.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\net\sf\ehcache\ehcache-core\2.6.2\ehcache-core-2.6.2.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\google\guava\guava\17.0\guava-17.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\beust\jcommander\1.35\jcommander-1.35.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\edu\ucar\httpservices\4.5.5\httpservices-4.5.5.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpclient\4.5.9\httpclient-4.5.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\httpcomponents\httpmime\4.5.9\httpmime-4.5.9.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\commons\commons-csv\1.0\commons-csv-1.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-utility\0.6\sis-utility-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\storage\sis-netcdf\0.6\sis-netcdf-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\storage\sis-storage\0.6\sis-storage-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-referencing\0.6\sis-referencing-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\apache\sis\core\sis-metadata\0.6\sis-metadata-0.6.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\org\opengis\geoapi\3.0.0\geoapi-3.0.0.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\javax\measure\jsr-275\0.9.3\jsr-275-0.9.3.jar;C:\App\maven\apache-maven-3.3.9\conf\repository\com\fasterxml\jackson\core\jackson-core\2.9.9\jackson-core-2.9.9.jar com.cnpc.util.CreateIndex
15:33:22.672 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
15:33:22.676 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer - ScratchFileBuffer not closed!
索引文档完成,共耗时:5511毫秒.
Process finished with exit code 0
查看索引目录:
6、查询界面
简而言之,查询界面就是后台接收用户搜索的关键词。
在controller目录下建立SearcherController.java。
import com.cnpc.domain.FileEntity;
import com.cnpc.domain.Msg;
import com.cnpc.service.SearchService;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.ModelAndView;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.UnsupportedEncodingException;
import java.util.List;
/**
* Created by grant on 2019/9/27.
*/
@RestController
public class SearchController{
private final static org.slf4j.Logger logger= LoggerFactory.getLogger(SearchController.class);
@Autowired
private SearchService searchService;
/**
* 初始化界面入口
* @return
*/
@RequestMapping(value = "init",method = RequestMethod.GET)
public ModelAndView init(){
return new ModelAndView("index");
}
/**
* 处理前台传入查询关键字,返回查询结果。
* @param request
* @param response
* @return
* @throws UnsupportedEncodingException
*/
@RequestMapping(value = "SearchFile",method = RequestMethod.GET)
public Msg searchFile(HttpServletRequest request, HttpServletResponse response) throws UnsupportedEncodingException {
String indexpathStr = "D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir";
//接受查询字符串
System.out.println("indexpathStr : "+indexpathStr);
String query = request.getParameter("query");
System.out.println("查询字符串为: "+query);
//编码格式转化
// query = new String(query.getBytes("iso8859-1"),"UTF-8");
if(query.equals("") || null == query){
System.out.println("参数错误");
return Msg.fail();
}else {
List<FileEntity> hitsList = searchService.SearchFile(query, indexpathStr, 100);
if(null == hitsList){
System.out.println("无匹配的记录");
return Msg.fail();
}
System.out.println("共搜到:"+hitsList.size()+" 条记录");
return Msg.success().add("resultList",hitsList);
}
}
}
在service下建立SearchService接口,创建查询接口。
import com.cnpc.domain.FileEntity;
import org.springframework.stereotype.Service;
import java.util.List;
/**
* Created by grant on 2019/9/27.
*/
@Service
public interface SearchService{
public List<FileEntity> SearchFile(String key,String indexpathStr,int n);
}
在service目录下创建impl目录,并在其目录下创建SearchServiceImpl.java,写查询的实现代码。
import com.cnpc.domain.FileEntity;
import com.cnpc.ik.IKAnalyzer6x;
import com.cnpc.service.SearchService;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created by grant on 2019/9/27.
*/
@Service
public class SearchServiceImp implements SearchService {
@Override
public List<FileEntity> SearchFile(String key, String indexpathStr, int n) {
ArrayList<FileEntity> histList = new ArrayList<FileEntity>();
//检索域
String[] fields = {"title","content"};
Path indexPath = Paths.get(indexpathStr);
Directory dir;
try {
//第一步,查询准备工作,创建Directory对象
dir = FSDirectory.open(indexPath);
DirectoryReader reader = DirectoryReader.open(dir);
//创建IndexReader对象
//创建IndexSearch对象
IndexSearcher searcher = new IndexSearcher(reader);
IKAnalyzer6x analyzer = new IKAnalyzer6x();
//第二步,闯将查询条件对象
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields, analyzer);
//查询字符串
System.out.println("keyword: "+key);
Query query = queryParser.parse(key);
TopDocs topDocs = searcher.search(query, n);
if(topDocs.totalHits == 0){
return null;
}
//定制高亮标签
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span style= \"color:red;\">", "</span>");
QueryScorer scoreTitle = new QueryScorer(query,fields[0]);
Highlighter highlighterTitle = new Highlighter(formatter, scoreTitle);
QueryScorer scoreContent = new QueryScorer(query, fields[1]);
Highlighter highlighterContent = new Highlighter(formatter, scoreContent);
TopDocs hits = searcher.search(query, 100);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
String title = doc.get("title");
String content = doc.get("content");
//获取tokenstream
TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
scoreDoc.doc, fields[0], new IKAnalyzer6x());
SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(scoreTitle);
highlighterTitle.setTextFragmenter(fragmenter);
String hl_title = highlighterContent.getBestFragment(tokenStream, title);
//获取高亮的片段,可以对其数量进行限制
tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), scoreDoc.doc, fields[1], new IKAnalyzer6x());
fragmenter = new SimpleSpanFragmenter(scoreContent);
highlighterContent.setTextFragmenter(fragmenter);
//获取高亮的片段,可以对其数量进行限制
String hl_content = highlighterContent.getBestFragment(tokenStream, content);
FileEntity fileEntity = new FileEntity(hl_title != null ? hl_title : title, hl_content != null ? hl_content : content);
histList.add(fileEntity);
}
dir.close();
reader.close();
}catch (IOException e){
e.printStackTrace();
}catch (ParseException e){
e.printStackTrace();
}catch (InvalidTokenOffsetsException e){
e.printStackTrace();
}
return histList;
}
}
在resource下面的application.properties中添加:
server.port=8888
在files下面放入5个有关大数据学习的文件,有doc、docx、pdf、txt格式,以供程序解析抽取内容。
在static下面导入boostrap和jquery文件,在templates下面放入查询的主页面,此处为index.html,本文从boostrap官网示例中拿取示例页面稍作修改。
index.html代码(已使用ajax做了数据绑定)
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- 上述3个meta标签*必须*放在最前面,任何其他内容都*必须*跟随其后! -->
<meta name="description" content="">
<meta name="author" content="">
<link rel="icon" href="../../favicon.ico">
<title>文档搜索</title>
<!-- Bootstrap core CSS -->
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link href="../../assets/css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="jumbotron.css" rel="stylesheet">
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<script src="../../assets/js/ie-emulation-modes-warning.js"></script>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://cdn.bootcss.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://cdn.bootcss.com/respond.js/1.4.2/respond.min.js"></script>
<![endif]-->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://cdn.bootcss.com/jquery/1.12.4/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="../../assets/js/vendor/jquery.min.js"><\/script>')</script>
<script src="https://cdn.bootcss.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="#">基于Lucene的文件搜索系统</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<form class="navbar-form navbar-right"><!-- action="SearchFile" method="get"-->
<div class="form-group">
<input type="text" placeholder="keyword" class="form-control" name="query" id="query">
</div>
<!-- <div class="form-group">
<input type="password" placeholder="Password" class="form-control">
</div>-->
<button type="button" class="btn btn-success" id="search_btn">检索</button>
</form>
</div><!--/.navbar-collapse -->
</div>
</nav>
<div class="jumbotron">
</div>
<div class="container">
<!--标题行 -->
<div class="row">
<div class="col-md-12">
<h1 id="h1-result">搜索结果</h1>
</div>
</div>
<!--按钮 -->
<div class="row">
<div class="col-md-4 col-md-offset-8">
<!--<button class="btn btn-primary" id="emp_add_modal_btn">新增</button>
<button class="btn btn-danger" id="emp_delete_all_btn">批量删除</button>-->
</div>
</div>
<!--显示表格数据 -->
<div class="row">
<div class="col-md-12">
<table class="table table-hover" id="result_table">
<thead>
<tr>
<th>
<input type="checkbox" id="check_all"/>
</th>
<th>文件名</th>
<th>文件内容</th>
<th>操作</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
</div>
<!--分页 -->
<div class="row">
<div class="col-md-6" id="page_info_area">
</div>
<div class="col-md-6" id="page_nav_area">
</div>
</div>
</div>
</div> <!-- /container -->
<!-- Main jumbotron for a primary marketing message or call to action -- <hr>>
<!--<footer>
<p>© 2019 Company, Inc.</p>
</footer>-->
<script type="text/javascript">
$(function(){
//一进来执行的代码
search();
});
function search(){
//点击搜索按钮
$("#search_btn").click(function(){
var query = $("#query").val();
$.ajax({
url:"/SearchFile",
data:"query="+query,
type:"GET",
success:function(result){
//1.解析并显示搜索结果
build_result_table(result);
}
});
});
}
function build_result_table(result){
//先清空数据
$("#result_table tbody").empty();
if(result.code == 200){
$("#h1-result").html("无匹配记录");
}else{
$("#h1-result").html("搜索结果");
var result=result.extend.resultList;
$.each(result,function(index,item){
var checkBoxTd = $("<td><input type ='checkbox' class='check_item'/></td>");
var titleTd=$("<td></td>").append(item.title);
var contentTd=$("<td></td>").append(item.content.length > 180 ? item.content.substring(0,80) +".....":item.content);
var editBtn=$("<button></button>").addClass("btn btn-primary btn-sm edit_btn")
.append($("<span></span>").addClass("glyphicon glyphicon-download-alt")).append("下载文件");
editBtn.attr("fileName",item.title);
$("<tr></tr>")
.append(checkBoxTd)
.append(titleTd)
.append(contentTd)
.append(editBtn)
.appendTo("#result_table tbody");
});
}
}
$(document).on("click",".edit_btn",function(){
var fileName = $(this).attr("fileName");
//下载文件的前端需使用window.open()。
/* $.ajax({
url:"/FileDownload",
data:"fileName="+fileName,
type:"GET"/!*,
success:function(result){
alert(result);
}*!/
});*/
window.open("http://" + window.location.host + "/FileDownload?fileName=" + fileName, '_blank')
});
</script>
</body>
</html>
7、文件检索
启动springboot程序,并访问localhost:8888,结果如下图。
键入搜索关键字:spark,结果如下
8、文件下载
搜索出结果后需要下载文件,在controller下面创建FileDownloadController.java,通过前台点击下载文件按钮,传入文件名给后台,后台从目录下检索文件路径,读入字节流,刷新到用户磁盘。
package com.cnpc.controller;
import com.cnpc.domain.FileEntity;
import com.cnpc.domain.Msg;
import com.cnpc.service.SearchService;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.ModelAndView;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.*;
import java.net.URLEncoder;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created by grant on 2019/9/27.
*/
@RestController
public class FileDownloadController {
private final static org.slf4j.Logger logger= LoggerFactory.getLogger(FileDownloadController.class);
@Autowired
private SearchService searchService;
@RequestMapping(value = "FileDownload",method = RequestMethod.GET)
@ResponseBody
public Msg searchFile(HttpServletRequest request, HttpServletResponse response) throws IOException {
String indexpathStr = "D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\indexdir";
//接受查询字符串
String fileName = delHtmlTag(request.getParameter("fileName"));
System.out.println("查询字符串为: "+fileName);
//编码格式转化,正常前台传入汉字先编码为iso8859-1,然后后台通过下面代码解码为UTF-8,
// 但这里从前台传入的汉字没有乱码,不知道为什么?
//fileName = new String(fileName.getBytes("iso8859-1"),"UTF-8");
if(fileName.equals("") || null == fileName){
System.out.println("参数错误");
return Msg.fail();
}else {
File file = new File("D:\\RF-WorkSpace\\ChinaOil\\newcode\\filesearch\\src\\main\\resources\\files\\"+fileName);
System.out.println(file.getPath());
/* //设置response的编码方式
response.setContentType("application/octet-stream");
//写明要下载的文件大小
response.setContentLength((int)file.length());
//解决中文乱码问题,向客户端发送返回页面的头信息
//1、Content-disposition是MIME协议的扩展
//2、attachment 作为附件下载
//3、在客户端弹出下载狂
//4、这个是下载文档的关键代码
response.setHeader("Content-Disposition","attachment;" +
"fileName="+new String(fileName.getBytes("UTF-8"),"iso8859-1"));
//读出I/O流
FileInputStream fis = new FileInputStream(file);
BufferedInputStream buff = new BufferedInputStream(fis);
byte[] b = new byte[1024];//相当于缓存
int k = 0;//该值用于计算当前实际下载了多少字节
//从response对象中得到输出流,准备下载
ServletOutputStream myout = response.getOutputStream();
//开始循环下载
while (-1 != (k = fis.read(b,0,b.length))){
//将b中数据写入客户端的内存
myout.write(b,0,k);
}
//将写入到客户端内存的数据,刷新到磁盘
myout.flush();
fis.close();
buff.close();
System.out.println("下载成功");*/
// 如果文件名存在,则进行下载
if (file.exists()) {
// 配置文件下载,设置response的编码方式
response.setHeader("content-type", "application/octet-stream");
response.setContentType("application/octet-stream");
// 下载文件能正常显示中文, 设置扩展头,当Content-Type 的类型为要下载的类型时 , 这个信息头会告诉浏览器这个文件的名字和类型。
response.setHeader("Content-Disposition", "attachment;filename=" + URLEncoder.encode(fileName, "UTF-8"));
// 实现文件下载
byte[] buffer = new byte[1024];//相当于缓存
FileInputStream fis = null;
BufferedInputStream bis = null;
try {
//读出I/O流
fis = new FileInputStream(file);
bis = new BufferedInputStream(fis);
OutputStream os = response.getOutputStream();
int i = bis.read(buffer);//该值用于计算当前实际下载了多少字节
while (i != -1) {
//将buffer中数据写入客户端的内存
os.write(buffer, 0, i);
i = bis.read(buffer);
}
System.out.println("Download the song successfully!");
}
catch (Exception e) {
System.out.println("Download the song failed!");
}
finally {
if (bis != null) {
try {
bis.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (fis != null) {
try {
fis.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
return null;
}
/**
* 由于解析抽取后的文档带有html标签,该方法将传过来的文件名的html标签去掉
"resultList": [{
"title": "<span style= \"color:red;\">Spark</span> SQL.pdf",
"content": "\n<span style= \"color:red;\">Spark</span> SQL\n\n一起进步!\n\n\n\nShark\n\n• Shark是基于<span style= \"color:red;\">Spark</span>计算框架之上且兼容Hive语法的SQL执行引擎,由\n\n于底层的计算采用了<span style= \"color:red;\">Spark</span>"
}]
* @param line
* @return
*/
public String delHtmlTag(String line){
String regEx_html = "<[^>]+>";
//创建Pattern对象
Pattern pattern = Pattern.compile(regEx_html);
//创建matcher对象
Matcher matcher = pattern.matcher(line);
line= matcher.replaceAll("");
return line;
}
}
重启项目,前台点击下载按钮,测试下载。
至此,使用Lucene文件检索项目实战的示例完成。