docx4j word to html,java - Docx4j convert html to docx - Stack Overflow

I have new problem when I convert HTML to docx it throws exception:

org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 73; The entity "nbsp" was referenced, but not declared

As I understood, it is because docx4j thinks that my file is XML and wants to convert it to docx but there are only 5 predefined entities in XML and such entities as nbsp are not defined in XML. How can I make docx4j convert HTML to doc, without declaring the entity nbsp in the doctype?

Is it incorrect work of docx4j or it's limitation?

Here is my code:

package ru.simplexsoftware.constructorOfDocuments.web.rest;

import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;

import org.docx4j.openpackaging.exceptions.Docx4JException;

import org.docx4j.openpackaging.exceptions.InvalidFormatException;

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart;

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.web.HttpRequestHandler;

import ru.simplexsoftware.constructorOfDocuments.dao.TemplateDao;

import javax.servlet.ServletException;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;

import javax.xml.bind.JAXBException;

import java.io.ByteArrayInputStream;

import java.io.ByteArrayOutputStream;

import java.io.IOException;

import java.io.InputStream;

import java.nio.charset.StandardCharsets;

public class DocxFileDownloadServlet implements HttpRequestHandler {

@Autowired

TemplateDao templateDao;

@Override

public void handleRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {

String parameter = request.getParameter("documentId");

Long documentId = Long.parseLong(parameter);

WordprocessingMLPackage wordMLPackage = null;

try {

wordMLPackage = WordprocessingMLPackage.createPackage();

} catch (InvalidFormatException e) {

e.printStackTrace();

}

NumberingDefinitionsPart ndp = null;

try {

ndp = new NumberingDefinitionsPart();

} catch (InvalidFormatException e) {

e.printStackTrace();

}

try {

wordMLPackage.getMainDocumentPart().addTargetPart(ndp);

} catch (InvalidFormatException e) {

e.printStackTrace();

}

try {

ndp.unmarshalDefaultNumbering();

} catch (JAXBException e) {

e.printStackTrace();

}

XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);

xHTMLImporter.setHyperlinkStyle("Hyperlink");

String htmlString=templateDao.get(documentId).html;

htmlString = htmlString.replaceAll("
","
");

InputStream stream = new ByteArrayInputStream(htmlString.getBytes(StandardCharsets.UTF_8.name()));

// Convert the XHTML, and add it into the empty docx we made

try {

wordMLPackage.getMainDocumentPart().getContent().addAll(

xHTMLImporter.convert(htmlString, null));

} catch (Docx4JException e) {

e.printStackTrace();

}

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

try {

wordMLPackage.save(outputStream);

} catch (Docx4JException e) {

e.printStackTrace();

}

response.setContentType("application/msword");

response.getOutputStream().write(outputStream.toString().getBytes("UTF-8"));

response.flushBuffer();

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值