有个需求是通过pdf生成html。研究了下打算使用itext。然后就去官网上看了下,itext已经有itext7了,然后还不太完善。后面还是用了itext5。导入了以下的maven包:
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
</dependency>
xmlworker是itext解析html的工具就不用说了,jsoup可以将html格式化成标准html当然还不是标准的xhtml,自个改了下jsoup:
打开jsoup的Element源代码,query一下img(1165行),然后注释3行(也可以自个加个变量进行控制):
// selfclosing includes unknown tags, isEmpty defines tags that are always empty
if (childNodes.isEmpty() && tag.isSelfClosing()) {
//if (out.syntax() == Document.OutputSettings.Syntax.html && tag.isEmpty())
// accum.append('>');
//else
accum.append(" />"); // <img> in html, <img /> in xml
}
写程序中发现问题如下:
1.中文不显示。
2.不能转换多行。
3.图片不能显示(图片标记用了下可以使用http的方式和相对路径和绝对路径的方式,直接写入的Base64的记录并不能认,暂不要使用)
以下是代码,已处理了问题,供参考:
package com.junziqian.common;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.io.FileUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.Pipeline;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerFontProvider;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.html.CssAppliers;
import com.itextpdf.tool.xml.html.CssAppliersImpl;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
/**
* 将html转换为pdf
* @author yfx
* 2016年5月18日 下午5:52:13
*/
public class HtmlToPdfUtil {
public static void buildPdf(List<String> contexts,String destFile) throws DocumentException, IOException{
byte[] result=buildPdf(contexts);
FileUtils.writeByteArrayToFile(new File(destFile), result);
}
/**
* 生成多页pdf
* @param contexts
* @return
* @throws DocumentException
* @throws IOException
*/
public static byte[] buildPdf(List<String> contexts) throws DocumentException, IOException{
ByteArrayOutputStream baos=new ByteArrayOutputStream(1024);
Document document = new Document();
PdfCopy copy = new PdfCopy(document, baos);
document.open();
PdfReader reader;
for (String ctx : contexts) {
reader = new PdfReader(buildPdf(ctx));
copy.addDocument(reader);
reader.close();
}
document.close();
byte[] result=baos.toByteArray();
baos.flush();
baos.close();
return result;
}
/**
* 生成单页pdf
* @param ctx
* @return
* @throws DocumentException
* @throws IOException
*/
public static byte[] buildPdf(String ctx) throws DocumentException, IOException{
ByteArrayOutputStream baos=new ByteArrayOutputStream(1024);
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
writer.setInitialLeading(12);//文字间距
document.open();
HtmlToPdfUtil.MyFontsProvider fontProvider = new HtmlToPdfUtil.MyFontsProvider();
fontProvider.addFontSubstitute("lowagie", "garamond");
fontProvider.setUseUnicode(true);
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,new HtmlPipeline(htmlContext, new PdfWriterPipeline(document,writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(worker);
ByteArrayInputStream bais=new ByteArrayInputStream(ctx.getBytes());
p.parse(new InputStreamReader(bais));
p.flush();
document.close();
byte[] result=baos.toByteArray();
baos.flush();
baos.close();
return result;
}
public static class MyFontsProvider extends XMLWorkerFontProvider{
public MyFontsProvider(){
super(null,null);
}
@Override
public Font getFont(final String fontname, String encoding, float size, final int style) {
String fntname = fontname;
if(fntname==null){
fntname="宋体";
}
return super.getFont(fntname, encoding, size, style);
}
}
public static void main(String[] args) throws IOException, DocumentException {
String DEST = "./test2015-11.pdf";
File file = new File(DEST);
file.getParentFile().mkdirs();
ArrayList<String> str=new ArrayList<String>();
str.add(JsoupUtil.getXhtml("<div>中文hello</div><br><img style='width:500px;' src='http://static.ebaoquan.org/themisGroup/M00/F6/79/ChMzJlcwX3mAa6DQAAfMg7vk0cU876.jpg'>"));
str.add(JsoupUtil.getXhtml("<div>中文hello111</div><br>"));
HtmlToPdfUtil.buildPdf(str, DEST);
}
}
A4纸的像素和分辨率
根据A4纸的尺寸是210毫米×297毫米,而1英寸=2.54厘米,我们可以得出当分辨率是多少像素时得出A4纸的大小尺寸为多少毫米。如下是我们较长用到的规格尺寸:
当分辨率是72像素/英寸时,A4纸长宽像素分别是842×595;
当分辨率是120像素/英寸时,A4纸长宽像素分别是2105×1487;
当分辨率是150像素/英寸时,A4纸长宽像素分别是1754×1240;
当分辨率是300像素/英寸时,A4纸长宽像素分别是3508×2479;
参考了:http://gridmix.blog.51cto.com/4764051/1229585
还没在linux上试,用了有问题再看。