POI介绍:Apache POI - the Java API for Microsoft Documents
有时我们需要在网页中对word文件进行预览,这种预览可能并不需要太多功能,只要能够看到word里面所有的信息,并且预览的格式和用Office软件查看的格式有90%相似就可以满足要求,那么你可以选择用POI来对word进行格式转换。
用POI有个最大的优点是,你不需要在服务器安装其他第三方软件,做各种配置,你需要做的仅仅是在你的工程中添加POI的jar包就可以了。
以下方法需要用到的jar包为:poi-3.9-20121203.jar 、poi-scratchpad-3.9-20121203.jar
public static void convertDoc2Html(String docFilePath,String htmlFilePath)
throws IOException ,TransformerException,ParserConfigurationException
{
File docFile = new File(docFilePath);
OutputStream output = null;
StringWriter writer = null;
try{
if(docFile.exists()){
File htmlFile = new File(htmlFilePath);
File htmlFileParent = new File(htmlFile.getParent());
if(!htmlFileParent.exists()){//如果父目录不存在,则创建父目录
htmlFileParent.mkdirs();
}
InputStream input = new FileInputStream(docFile);
HWPFDocument wordDocument = new HWPFDocument(input);
WordToHtmlConverter converter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
final String prefix = DateTools.getAllInfo(new Date());//设置图片文件的前缀,以防止文件名重复,这里getAllInfo方法返回的是“年月日时分秒毫秒”
//将doc文件中的图片重命名
converter.setPicturesManager(new PicturesManager() {
public String savePicture(byte[] content, PictureType pictureType,
String suggestedName, float widthInches, float heightInches) {
return prefix + "_" + suggestedName;
}
});
converter.processDocument(wordDocument);
//将doc中的图片文件写入到与html同级目录中
List<Picture> pics = wordDocument.getPicturesTable().getAllPictures();
if (pics != null) {
for(Picture pic : pics) {
try {
pic.writeImageContent(new FileOutputStream(htmlFileParent.getPath()+"/"
+ prefix + "_" + pic.suggestFullFileName()));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
writer = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(
new DOMSource(converter.getDocument()),
new StreamResult(writer) );
//将doc内容写入到html文件中
output = new FileOutputStream(htmlFile);
output.write(writer.toString().getBytes("UTF-8"));
output.flush();
output.close();
writer.close();
}
}finally{
try{
if(output != null){
output.close();
}
if(writer != null){
writer.close();
}
}catch(IOException e){
e.printStackTrace();
}
}
}
如上这个方法只对doc有效果,如果是docx,则不可用。如果谁知道如何将docx转换为html,请共享一下。