实现了两种格式Word文档转Html的需求
优点:可以实现多图的doc文档和docx文档转HTML代码,图片也会完美展示,图片不需要保存到本地服务器,直接上传到文件服务器即可,文档格式也会保留
缺点:文档的页边距格式也会保留,由于html代码比较复杂,如果需要去除页边距格式建议是前端进行处理
controller层
@PostMapping("/uploadAndConvert")
public ApiResult<DocVO> uploadAndConvert(@RequestParam("file") MultipartFile request){
String fileName = request.getOriginalFilename();
if (fileName ==null){
return ApiResult.fail("文件名为空");
}
try {
InputStream inputStream = request.getInputStream();
return newsService.uploadAndConvert(inputStream,fileName);
}catch (IOException e){
return ApiResult.fail("上传转换异常");
}
}
实现层
@Override
public ApiResult<DocVO> uploadAndConvert(InputStream inputStream,String fileName) throws IOException {
byte[] bytes = IOUtils.toByteArray(inputStream);
//需要调用自己的文件上传下载服务
String uploadUrl = FileSystemClient.upload(bytes, 30);
String docSource = uploadUrl+"&filename="+fileName;
DocVO docVO = new DocVO();
if (uploadUrl != null){
//需要调用自己的文件上传下载服务
byte[] download = FileSystemClient.download(uploadUrl, 30);
if (fileName.endsWith(".doc")){
return docToHtml(bytes, docSource, docVO);
}else if(fileName.endsWith(".docx")){
return docxToHtml(docSource, docVO, download);
}
}
return ApiResult.fail("文件上传失败");
}
doc文档转Html
HWPFDocument hwpfDocument = new HWPFDocument(new ByteArrayInputStream(bytes));
org.w3c.dom.Document newDocument = XMLHelper.getDocumentBuilderFactory().newDocumentBuilder().newDocument();
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument);
wordToHtmlConverter.setPicturesManager( new PicturesManager()
{
//重写savePicture方法去上传图片到文件服务器并返回图片访问地址,这样就会自动把图片链接放到html的对应位置
@Override
public String savePicture(byte[] content,
PictureType pictureType, String suggestedName,
float widthInches, float heightInches )
{
//这里需要调用你自己的文件上传服务,content就是图片的byte数组,直接就是savePicture方法的参数,不需要额外传参(我也没太懂,但是可以直接用)
return FileSystemClient.upload(content,30);
}
} );
wordToHtmlConverter.processDocument(hwpfDocument);
List<Picture> allPictures = hwpfDocument.getPicturesTable().getAllPictures();
if (org.apache.commons.collections.CollectionUtils.isNotEmpty(allPictures)){
for (Picture p : allPictures) {
Picture picture = p;
FileSystemClient.upload(picture.getContent(),30);
}
}
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT,"no");
transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
transformer.setOutputProperty(OutputKeys.METHOD, "html");
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(wordToHtmlConverter.getDocument()), new StreamResult(stringWriter));
//最终的html,直接返给前端进行渲染就可以了
String html = stringWriter.toString();
docx转Html
//与doc转html基本类似,差异在于工具类的不同
XWPFDocument document = new XWPFDocument(new ByteArrayInputStream(download));
StringWriter stringWriter = new StringWriter();
XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();
XHTMLOptions options = XHTMLOptions.create();
//docx转html时同样需要重写指定工具类的方法
options.setImageManager( new ImageManager(new File(""), "") {
String upload = null;
@Override
public void extract(String imagePath, byte[] imageData) throws IOException {
//自己的文件上传接口,imageData同样是上面的byte数组,不需要额外传参
upload = FileSystemClient.upload(imageData, 30);
}
@Override
public String resolve(String uri) {
//这里返回的是文件上传服务返回的图片访问地址
return upload;
}
});
xhtmlConverter.convert(document, stringWriter, options);
//最终的html,直接返给前端进行渲染就可以了
String html = new String(stringWriter.toString().getBytes("utf-8"), "utf-8");
最后特别感谢两位大佬的文章,让我受益匪浅
https://www.jianshu.com/p/272b0ac2f06e
https://www.cnblogs.com/jameslif/p/3356588.html