POI解析DOC,转换为HTML

2 篇文章 0 订阅
本文介绍了一个将Word文档转换为HTML格式的方法。通过使用POI库中的HWPFDocument类读取Word文档内容,并借助WordToHtmlConverter进行转换。该过程还包括了图片的管理和保存,确保转换后的HTML文件能够正确显示文档中的所有元素。
摘要由CSDN通过智能技术生成

环境

poi-3.9

代码

public class WordToHtml {  
	
	private static final String encoding = "UTF-8";

	public static String convert2Html(String wordPath)
			throws FileNotFoundException, TransformerException, IOException,
			ParserConfigurationException {
		if( wordPath == null || "".equals(wordPath) ) return "";
		File file = new File(wordPath);
		if( file.exists() && file.isFile() )
			return convert2Html(new FileInputStream(file));
		else
			return "";
    }
  
	public static String convert2Html(InputStream is)
			throws TransformerException, IOException,
			ParserConfigurationException {
		HWPFDocument wordDocument = new HWPFDocument(is);
		WordToHtmlConverter converter = new WordToHtmlConverter(
				DocumentBuilderFactory.newInstance().newDocumentBuilder()
						.newDocument());
		
		// 添加图片前缀,以防图片重复覆盖
		SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
		final String prefix = sdf.format(new Date());
		
		converter.setPicturesManager(new PicturesManager() {
				public String savePicture(byte[] content, PictureType pictureType,
						String suggestedName, float widthInches, float heightInches) {
					return prefix + "_" + suggestedName;
				}
		});
		converter.processDocument(wordDocument);
		
		List<Picture> pics = wordDocument.getPicturesTable().getAllPictures();
		if (pics != null) {
			for(Picture pic : pics) {
				try {
					pic.writeImageContent(new FileOutputStream(
							"/" + prefix + "_" + pic.suggestFullFileName()));
				} catch (FileNotFoundException e) {
					e.printStackTrace();
				}
			}
		}
		
		StringWriter writer = new StringWriter();
		
		Transformer serializer = TransformerFactory.newInstance().newTransformer();
		serializer.setOutputProperty(OutputKeys.ENCODING, encoding);
		serializer.setOutputProperty(OutputKeys.INDENT, "yes");
		serializer.setOutputProperty(OutputKeys.METHOD, "html");
		serializer.transform(
				new DOMSource(converter.getDocument()),
				new StreamResult(writer) );
		writer.close();
		return writer.toString();
	}
}  


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值