itext html pdf css,Converting HTML to PDF using iText

I am posting this question because many developers ask more or less the same question in different forms. I will answer this question myself (I am the Founder/CTO of iText Group), so that it can be a "Wiki-answer." If the Stack Overflow "documentation" feature still existed, this would have been a good candidate for a documentation topic.

The source file:

I am trying to convert the following HTML file to PDF:

Colossal (movie)

.poster { width: 120px;float: right; }

.director { font-style: italic; }

.description { font-family: serif; }

.imdb { font-size: 0.8em; }

a { color: red; }

colossal.jpg

Colossal (2016)

Directed by Nacho Vigalondo
Gloria is an out-of-work party girl

forced to leave her life in New York City, and move back home.

When reports surface that a giant creature is destroying Seoul,

she gradually comes to the realization that she is somehow connected

to this phenomenon.

Read more about this movie on

IMDB

In a browser, this HTML looks like this:

1b09e95eac77cd2cd1b3cbbe78442c6d.png

The problems I encountered:

HTMLWorker doesn't take CSS into account at all

When I used HTMLWorker, I need to create an ImageProvider to avoid an error that informs me that the image can't be found. I also need to create a StyleSheet instance to change some of the styles:

public static class MyImageFactory implements ImageProvider {

public Image getImage(String src, Map h,

ChainedProperties cprops, DocListener doc) {

try {

return Image.getInstance(

String.format("resources/html/img/%s",

src.substring(src.lastIndexOf("/") + 1)));

} catch (DocumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

return null;

}

}

public static void main(String[] args) throws IOException, DocumentException {

Document document = new Document();

PdfWriter.getInstance(document, new FileOutputStream("results/htmlworker.pdf"));

document.open();

StyleSheet styles = new StyleSheet();

styles.loadStyle("imdb", "size", "-3");

HTMLWorker htmlWorker = new HTMLWorker(document, null, styles);

HashMap providers = new HashMap();

providers.put(HTMLWorker.IMG_PROVIDER, new MyImageFactory());

htmlWorker.setProviders(providers);

htmlWorker.parse(new FileReader("resources/html/sample.html"));

document.close();

}

The result looks like this:

1b51c66a283f74f0818ea959776e7778.png

For some reason, HTMLWorker also shows the content of the

tag. I don't know how to avoid this. The CSS in the header isn't parsed at all, I have to define all the styles in my code, using the StyleSheet object.

When I look at my code, I see that plenty of objects and methods I'm using are deprecated:

9a737c4a6b736f84901390676a23406b.png

So I decided to upgrade to using XML Worker.

Images aren't found when using XML Worker

I tried the following code:

public static final String DEST = "results/xmlworker1.pdf";

public static final String HTML = "resources/html/sample.html";

public void createPdf(String file) throws IOException, DocumentException {

Document document = new Document();

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));

document.open();

XMLWorkerHelper.getInstance().parseXHtml(writer, document,

new FileInputStream(HTML));

document.close();

}

This resulted in the following PDF:

2b672afee6631eedc772e2079e483fa6.png

Instead of Times-Roman, the default font Helvetica is used; this is typical for iText (I should have defined a font explicitly in my HTML). Otherwise, the CSS seems to be respected, but the image is missing, and I didn't get an error message.

With HTMLWorker, an exception was thrown, and I was able to fix the problem by introducing an ImageProvider. Let's see if this works for XML Worker.

Not all CSS styles are supported in XML Worker

I adapted my code like this:

public static final String DEST = "results/xmlworker2.pdf";

public static final String HTML = "resources/html/sample.html";

public static final String IMG_PATH = "resources/html/";

public void createPdf(String file) throws IOException, DocumentException {

Document document = new Document();

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));

document.open();

CSSResolver cssResolver =

XMLWorkerHelper.getInstance().getDefaultCssResolver(true);

HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);

htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

htmlContext.setImageProvider(new AbstractImageProvider() {

public String getImageRootPath() {

return IMG_PATH;

}

});

PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);

HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);

CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

XMLWorker worker = new XMLWorker(css, true);

XMLParser p = new XMLParser(worker);

p.parse(new FileInputStream(HTML));

document.close();

}

My code is much longer, but now the image is rendered:

6ab2789319fb2f3e2e327cb77e689725.png

The image is larger than when I rendered it using HTMLWorker which tells me that the CSS attribute width for the poster class is taken into account, but the float attribute is ignored. How do I fix this?

The remaining question:

So the question boils down to this: I have a specific HTML file that I try to convert to PDF. I have gone through a lot of work, fixing one problem after the other, but there is one specific problem that I can't solve: how do I make iText respect CSS that defines the position of an element, such as float: right?

Additional question:

When my HTML contains form elements (such as ), those form elements are ignored.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值