下面给出的程序是用来提取文本文档的内容和元数据:
importjava.io.File;importjava.io.FileInputStream;importjava.io.IOException;importorg.apache.tika.exception.TikaException;importorg.apache.tika.metadata.Metadata;importorg.apache.tika.parser.ParseContext;importorg.apache.tika.sax.BodyContentHandler;importorg.apache.tika.parser.txt.TXTParser;importorg.xml.sax.SAXException;publicclassTextParser{publicstaticvoidmain(finalString[]args)throwsIOException,SAXException,TikaException{//detecting the file typeBodyContentHandlerhandler=newBodyContentHandler();Metadatametadata=newMetadata();FileInputStreaminputstream=newFileInputStream(newFile("example.txt"));ParseContextpcontext=newParseContext();//Text document parserTXTParserTexTParser=newTXTParser();TexTParser.parse(inputstream,handler,metadata,pcontext);System.out.println("Contents of the document:"+handler.toString());System.out.println("Metadata of the document:");String[]metadataNames=metadata.names();for(Stringname:metadataNames){System.out.println(name+" : "+metadata.get(name));}}}
保存上述代码作为TextParser.java,并通过使用下面的命令从命令提示编译:
javacTextParser.java
javaTextParser
下面给出的是example.txt文件的快照:
文本文件具有以下属性:
执行上述程序后,将得到下面的输出。
输出:
Contents of the document:
At Yiibai.com, we strive hard to provide quality tutorials for self-learning purpose in the domains of Academics, Information Technology, Management and Computer Programming Languages.
The endeavour started by Hema su, who is the founder and the managing director of Yiibai Pvt. Ltd. He came up with the website yiibai.com in year 2014 with the help of handpicked freelancers, with an array of tutorials for computer programming languages.
Metadata of the document: Content-Encoding: windows-1252 Content-Type: text/plain; charset=windows-1252
¥ 我要打赏
纠错/补充
收藏
加QQ群啦,易百教程官方技术学习群
注意:建议每个人选自己的技术方向加群,同一个QQ最多限加 3 个群。