使用aspose-words完成word和PDF、HTML之间的相互转换

最新推荐文章于 2024-10-19 15:56:45 发布

请叫我张大胆

最新推荐文章于 2024-10-19 15:56:45 发布

阅读量3.1k

点赞数 3

分类专栏： word在线预览 word与html/pdf相互转换文章标签： java poi msword

本文链接：https://blog.csdn.net/weixin_43671737/article/details/115601414

版权

word在线预览同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

word与html/pdf相互转换

1 篇文章 0 订阅

订阅专栏

实现word和PDF、HTML之间的相互转换使用aspose-words完成

因为工作需要把word转为PDF实现在线预览、word的内容放到富文本编辑器实现在线编辑、支持富文本内容在线导出为word文档等，就需要把word——>html/pdf——>word之间来回转换。也是翻阅了很多资料，有通过poi完成word和HTML互相转换的，最后自己也发现还是有问题，用poi生成的word在解析就解析出错。也有通过openoffice完成的，感兴趣的可以去了解下。话不多说老规矩，上代码。最后说一下仅限个人学习，请勿商用。

先导包，pom文件添加aspose-words的依赖

  <!--aspose-words-->
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>15.8.0</version>
    </dependency>

将word文件转为pdf

 //将word文件转为pdf文件
 public static boolean wordToPdf(String inPath, String outPath) {
    FileOutputStream os = null;
    try {
        // 新建一个空白pdf文档
        File file = new File(outPath);
        os = new FileOutputStream(file);
        Document doc = new Document(inPath);
        doc.save(os, SaveFormat.PDF);
    } catch (Exception e) {
        e.printStackTrace();
        return false;
    } finally {
        if (os != null) {
            try {
                os.flush();
                os.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    return true;
 }
 //程序使用多的还是将word转为pdf流给前端做在线预览
 //将输入word的InputStream流转为输出pdf的OutputStream流
 public static boolean wordToPdf(InputStream in,HttpServletResponse response) {
    try {
        Document doc = new Document(in);
        doc.save(response.getOutputStream(), SaveFormat.PDF);
        return true;
    } catch (Exception e) {
        e.printStackTrace();
    }
    return false;
}

读取word的内容，获取为HTML的字符串。图片全部转为base64编码。

  //将word的内容转为html返回字符串，图片全部转为base64编码。
  public static String wordToHtml(InputStream in) {
    ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
    String htmlText = "";
    try {
        Document doc = new Document(in);
        HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);
        opts.setExportXhtmlTransitional(true);
        opts.setExportImagesAsBase64(true);
        opts.setExportPageSetup(true);
        doc.save(htmlStream,opts);
        htmlText = new String(htmlStream.toByteArray(), StandardCharsets.UTF_8);
        htmlStream.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
    return htmlText;
}

将html内容转为word，支持doc和docx格式。

 public static boolean htmlToWord(HttpServletResponse response, String html) {
    try {
        //创建临时文件
        //根据你的需要来决定生成doc还是docxx
        File htmlFile = File.createTempFile("basis", ".docx");
        FileOutputStream out = new FileOutputStream(htmlFile);
        Document doc = new Document();
        DocumentBuilder builder = new DocumentBuilder(doc);
        builder.insertHtml(html);
        //生成docx文件
        doc.save(out, SaveOptions.createSaveOptions(SaveFormat.DOCX));
        //生成Word文档之后，关闭输入流
        out.close();
        OutputStream outputStream = response.getOutputStream();
        //输出文件
        InputStream inputStream = new FileInputStream(htmlFile);
        IOUtils.copy(inputStream, outputStream);
        inputStream.close();
        //关闭程序时，删除临时文件
        String fileName=htmlFile.getName();
        if (htmlFile.delete()) {
            log.info("临时文件:{}删除成功！",fileName);
        }
        return true;
    } catch (Exception e) {
        e.printStackTrace();
    }
    return false;
}

工具类写完了，下面来看下效果。
先创建一个测试的word名字就叫text2.docx。里面包含图片，文字样式。
在这里插入图片描述

 	@ApiOperation(value = "word转pdf")
    @GetMapping("/wordToPDF")
    public Result wordToPDF(HttpServletResponse response) throws IOException {
        return Result.success(WordAsposeUtil.wordToPdf(new FileInputStream("F:\\test2.docx"),response));
    }

效果还是可以的。

为了测试简单直接把解析word到HTML再到word写为一个方法。

@ApiOperation(value = "生成word")
@GetMapping("writeWord")
public Result writeWord(HttpServletResponse response) {
    try {
        html = WordAsposeUtil.wordToHtml(new FileInputStream("F:\\test2.docx"));
        //设置word格式
        response.setContentType("application/msword");
        response.setCharacterEncoding("UTF-8");
        //设置返回的文件名称和类型
        response.addHeader("Content-Disposition", "attachment;filename=" +
                new String("测试".getBytes("GB2312"), "iso8859-1") + ".docx");
        if (WordAsposeUtil.htmlToWord(response, html) {
            return Result.success();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return null;
}