itext linux 中文乱码,使用ITextRenderer从HTML生成pdf文件时的编码问题

I am trying to generate a pdf document using ITextRenderer that contains non-latin characters. In my case here is Bulgarian.

Before calling ITextRenderer, I have a String content that after some processes (like parsing with tidy) looks like that (I am able to see this value through debugging)

Sting content:

td class="description">Вид на потока

td class="description">Статус на потока

The above is just a part of my String. This content contains a valid html syntax. I just put here a small part of it to clarify that until this part, my encoding is right since I am able to read Bulgarian characters.

After that, the following code takes place which creates a document, put it in itextrenderer and generate the pdf file. This code is already tested and working for contents of lating characters since I was able to successfully generate a pdf file for english language.

The problem appears when I switch in another language (Bulgarian) with non latin characters. The generated PDF ignores all the bulgarian characters and the final result is a pdf with a lot of empty lines. This is the part of the code that generates the pdf

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setValidating(false);

dbf.setNamespaceAware(false);

dbf.setFeature("http://xml.org/sax/features/namespaces", false);

dbf.setFeature("http://xml.org/sax/features/validation", false);

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

DocumentBuilder builder = dbf.newDocumentBuilder();

Document doc = builder.parse(new ByteArrayInputStream(content.getBytes("UTF-8")));

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

InputStream is = null;

ITextRenderer renderer = new ITextRenderer();

renderer.getFontResolver().addFont("fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

renderer.getFontResolver().addFont("fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

renderer.getFontResolver().addFont("fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

renderer.getFontResolver().addFont("fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

renderer.setDocument(doc, null);

renderer.layout();

renderer.createPDF(outputStream);

outputStream.close();

byte[] outputBytes = outputStream.toByteArray();

is = new ByteArrayInputStream(outputBytes);

response.setContentType("application");

response.addHeader("Content-Disposition", "attachment; filename=\"" + "exported.pdf" + "\"");

response.setContentLength(outputBytes.length);

response.getOutputStream().write(inputStreamToBytes(is));

I have tried several things (mainly related to encoding) but unfortunately I haven't found a solution yet. Probably I am missing something obvious here :)

I am not sure if this adds any value, but I am using spring and this code runs inside a Controller

Any help will be appreciated.

Thanx

解决方案

Is your HTML specifying the UTF-8 encoding? Are your font files being found in that path?

Take a look at this gist that says it works for Chinese characters on Linux by providing a path to the default location of fonts in the system.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值