java读取word-excel-ppt文件代码
2019-05-08
编程之家收集整理的这篇文章主要介绍了java读取word-excel-ppt文件代码,编程之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
WORD:
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.poi.hwpf.extractor.WordExtractor;
import java.io.File;
import java.io.InputStream;
import java.io.FileInputStream;
import com.search.code.Index;
public Document getDocument(Index index,String url,String title,InputStream is) throws DocCenterException {
String bodyText = null;
try {
WordExtractor ex = new WordExtractor(is);//is是WORD文件的InputStream
bodyText = ex.getText();
if(!bodyText.equals("")){
index.AddIndex(url,title,bodyText);
}
}catch (DocCenterException e) {
throw new DocCenterException("无法从该Mocriosoft Word文档中提取内容",e);
}catch(Exception e){
e.printStackTrace();
}
}
return null;
}
Excel:
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;