背景
我这里的需求是需要将html文件转成pdf文件,在windows测试环境中可以正常转换,但是在部署到ubuntu上时碰到一些问题
使用的python来处理,使用的python库是pdfkit,pdfkit的实质是通过【wkhtmltopdf】进行转换,所以需要先安装wkhtmltopdf,然后代码中可以进行转换
问题
-
ubuntu 中安装时报错
-
html文件中包含中文时,无法将中文转换出来
解决方案
ubuntu安装时报以下错误
- 安装【wkhtmltopdf】
apt-get install wkhtmltopdf
- 安装之后进行测试发现以下问题
root@iZ2ze4dhkm7u47lku6qixxZ:~# wkhtmltopdf https://www.baidu.com baid.pdf
QXcbConnection: Could not connect to display
Aborted
参考Linux上使用 wkhtmltopdf 将网页转成pdf的信息,可以看到最简单的解决方案是,安装【xvfb】
- 安装xvfb继续测试
apt-get install xvfb
转换成pdf之后,中文出现乱码问题
所以需要在ubuntu安装一个雅黑字体,参考资料
- 安装字体继续测试
下载【微软雅黑.ttf】文件上传到服务器[/usr/share/fonts/winFonts]
root@iZ2zedu2sj65xaim6aez5hZ:/usr/share/fonts# mkdir winFonts
root@iZ2zedu2sj65xaim6aez5hZ:/usr/share/fonts# cd winFonts/
root@iZ2zedu2sj65xaim6aez5hZ:/usr/share/fonts/winFonts# chmod 644 /usr/share/fonts/winFonts/*.ttf
root@iZ2zedu2sj65xaim6aez5hZ:/usr/share/fonts/winFonts# mkfontscale
至此,ubuntu中使用wkhtmltopdf的问题都已经解决,后面就是关于python pdfkit使用问题的解决方案
pdfkit中是直接使用wkhtmltopdf命令进行转换的,而前面ubuntu中安装wkhtmltopdf后不可以直接使用,是安装了xvfb的,所以要多pdfkit库进行重新处理
import subprocess
import sys
try:
# Python 2.x and 3.x support for checking string types
assert basestring
except NameError:
basestring = str
from pdfkit import PDFKit
class UbuntuConfiguration(object):
def __init__(self, wkhtmltopdf='', xvfb='', meta_tag_prefix='pdfkit-'):
self.meta_tag_prefix = meta_tag_prefix
self.wkhtmltopdf = wkhtmltopdf
self.xvfb = xvfb
if not self.wkhtmltopdf:
if sys.platform == 'win32':
self.wkhtmltopdf = subprocess.Popen(
['where', 'wkhtmltopdf'], stdout=subprocess.PIPE).communicate()[0].strip()
else:
self.wkhtmltopdf = subprocess.Popen(
['which', 'wkhtmltopdf'], stdout=subprocess.PIPE).communicate()[0].strip()
if not xvfb:
# 只针对ubuntu 最简单的解决方案,安装wkhtmltopdf 不足以解决问题时,需要安装xvfb-run来执行
if sys.platform != 'win32':
self.xvfb = subprocess.Popen(
['which', 'xvfb-run'], stdout=subprocess.PIPE).communicate()[0].strip()
# 确定是否已经安装
try:
with open(self.xvfb) as f:
pass
except IOError:
raise IOError('No xvfb executable found: "%s"\n' % self.xvfb)
try:
with open(self.wkhtmltopdf) as f:
pass
except IOError:
raise IOError('No wkhtmltopdf executable found: "%s"\n'
'If this file exists please check that this process can '
'read it. Otherwise please install wkhtmltopdf - '
'https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf' % self.wkhtmltopdf)
class UbuntuPDFKit(PDFKit):
def __init__(self, url_or_file, type_, options=None, toc=None, cover=None,
css=None, cover_first=False):
configuration = UbuntuConfiguration()
super(UbuntuPDFKit, self).__init__(url_or_file, type_, options, toc=toc, cover=cover, css=css,
cover_first=cover_first, configuration=configuration)
self.xvfb = self.configuration.xvfb
def _command(self, path=None):
"""
Generator of all command parts
"""
if self.css:
self._prepend_css(self.css)
yield self.xvfb
yield self.wkhtmltopdf
for argpart in self._genargs(self.options):
if argpart:
yield argpart
if self.cover and self.cover_first:
yield 'cover'
yield self.cover
if self.toc:
yield 'toc'
for argpart in self._genargs(self.toc):
if argpart:
yield argpart
if self.cover and not self.cover_first:
yield 'cover'
yield self.cover
# If the source is a string then we will pipe it into wkhtmltopdf
# If the source is file-like then we will read from it and pipe it in
if self.source.isString() or self.source.isFileObj():
yield '-'
else:
if isinstance(self.source.source, basestring):
yield self.source.to_s()
else:
for s in self.source.source:
yield s
# If output_path evaluates to False append '-' to end of args
# and wkhtmltopdf will pass generated PDF to stdout
if path:
yield path
else:
yield '-'
def xvfb_from_file(input, output_path, options=None, toc=None, cover=None, css=None, cover_first=False):
r = UbuntuPDFKit(input, 'file', options=options, toc=toc, cover=cover, css=css, cover_first=cover_first)
return r.to_pdf(output_path)
主要就是对commond增加上xvfb
代码中只定义了本地文件处理,如果有需求可以继续定义网络文件的方式。
至此,在python36中基本上可以使用了