很少写博客,可能心思有点杂,很少沉淀下来。但是偶尔看到很久前写的博文对一些小伙伴有帮助,还是比较开心的。几年前写过一篇 itext 转 pdf 富文本编辑器 想着的一些问题,发现留言还不少,刚巧后面有时间,也有些问题需要处理的时候,重新弄了一遍。这里贴一下,方便回顾,也方便有需要的小伙伴解决问题。以下是 html 转 png 的例子,实际上不管是转 pdf 还是转 png,要解决的还是 html 渲染过程中,html 规范导致的各类问题。
<dependency>
<groupId>net.sf.jtidy</groupId>
<artifactId>jtidy</artifactId>
<version>r938</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>core-renderer</artifactId>
<version>R8</version>
</dependency>
以上是本次所依赖的包。
import org.w3c.dom.Document;
import org.w3c.tidy.Tidy;
import org.xhtmlrenderer.swing.Java2DRenderer;
import org.xhtmlrenderer.util.FSImageWriter;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.imageio.ImageWriteParam;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.awt.image.BufferedImage;
import java.io.*;
import java.util.Date;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HtmlToImageUtils {
public static String convertToImage(String html, String path, String name, int width) throws IOException {
//输出标准 html
html = htmlCovertTohtml(html);
//过滤换行和 html 空格 ( 非标准 html)
html = fixSpace(html);
//闭合标签
html = fixEndTag(html);
//过滤特殊 unicode 字符
html = stripNonValidXMLCharacters(html);
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document document = null;
try {
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
StringReader reader = new StringReader(html);
InputSource inputSource = new InputSource(reader);
document = documentBuilder.parse(inputSource);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
String date = DateUtil.dateToString(new Date(), "yyyy/MM/dd");
String fullDirPath = path.concat("/").concat(date);
File dir = new File(fullDirPath);
if (!dir.exists()) {
if (!dir.mkdirs()) {
dir.mkdir();
}
}
String filePath = fullDirPath.concat("/").concat(name).concat(".png");
final Java2DRenderer renderer = new Java2DRenderer(document, width, -1);
renderer.setBufferedImageType(BufferedImage.TYPE_INT_RGB);
final BufferedImage img = renderer.getImage();
final FSImageWriter imageWriter = new FSImageWriter();
imageWriter.setWriteCompressionQuality(1f);
imageWriter.setWriteCompressionMode(ImageWriteParam.MODE_COPY_FROM_METADATA);
// imageWriter.setWriteCompressionType();
imageWriter.write(img, filePath);
return filePath;
}
private static String htmlCovertTohtml(String str) {
Tidy tidy = new Tidy();
tidy.setXmlOut(true);
tidy.setQuiet(true);
tidy.setShowWarnings(false);
tidy.setShowErrors(0);
tidy.setEncloseBlockText(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
Properties prop = new Properties();
prop.put("new-blocklevel-tags", "canvas");
tidy.getConfiguration().addProps(prop);
StringReader reader;
StringWriter writer;
try {
reader = new StringReader(str);
writer = new StringWriter();
// 输出的文件
tidy.parse(reader, writer);
String result = writer.toString();
// 转换完成关闭输入输出流
writer.close();
reader.close();
return result;
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
//过滤无效字符
public static String stripNonValidXMLCharacters(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in)))
return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught
// here; it should not happen.
if ((current == 0x9) || (current == 0xA) || (current == 0xD)
|| ((current >= 0x20) && (current <= 0xD7FF))
|| ((current >= 0xE000) && (current <= 0xFFFD))
|| ((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}
public static String fixEndTag(String html) {
Pattern pattern = Pattern.compile("<(\\s+)/[a-zA-Z]+>");
Matcher matcher = pattern.matcher(html);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String group = matcher.group(0);
String result = group.replaceAll("\\s", "");
matcher.appendReplacement(sb, result);
}
matcher.appendTail(sb);
return sb.toString();
}
public static String fixSpace(String html) {
final String pattern = "&(\\s*)nbsp;|\\n";
return html.replaceAll(pattern, " ");
}
}
html = fixEndTag(html);
这句的闭合标签,并不是将没有闭合的标签闭合,而是上一行中过滤了空格导致的例如</ div>的问题,在处理完 html 之后,我发现有的闭合标签换行显示了,导致在合并行的时候,标签中间出现了空格。所以这里的闭合标签,可以理解为过滤闭合标签中间的多余空格。
有问题欢迎沟通提问~