【采坑日记】Linux Server下使用PDFKit生成pdf文件

零妖大盗 V8

已于 2023-01-28 14:54:34 修改

阅读量2.4k

点赞数

文章标签： linux pdf python

于 2021-03-03 09:21:23 首次发布

本文为博主原创文章，未经博主允许不得转载。转载需注明原文地址。

本文链接：https://blog.csdn.net/tsoTeo/article/details/114280658

版权

Python 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

Python-PDFKit: HTML to PDF wrapper

Github：https://github.com/JazzCore/python-pdfkit

这个第三方库是Ruby PDFKit库的改编版本，实际上是对wkhtmltopdf进行的一次封装。所以，在使用这个库时，应该也安装上wkhtmltopdf库。

安装

1. Install python-pdfkit:

$ pip install pdfkit  (or pip3 for python3)

2. Install wkhtmltopdf:

Debian/Ubuntu 可以通过如下方式进行安装

$ sudo yum install wkhtmltopdf

Redhat/CentOS等其他操作系统，也可以使用下面的方式进行安装，但是不建议这样做了。

$ sudo yum install wkhtmltopdf （不同操作系统安装包的方式略有不同）

由于yum库里的wkhtmltopdf版本过旧，而新版的不再依赖X server，会导致 wkhtmltopdf: cannot connect to X server 这样的报错。我就是用的这种方式失足踩了坑，解决方法在文章最末尾。

建议参看 wkhtmltopdf 官方推荐的方式进行安装。

使用

简单的用法：

import pdfkit

pdfkit.from_url('http://www.baidu.com', 'output.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

您可以传递一个包含多个url或文件的列表：

pdfkit.from_url(['www.baidu.com', 'www.alibaba.com', 'www.tencent.com'], 'bat.pdf')
pdfkit.from_file(['file1.html', 'file2.html'], 'output.pdf')

你也可以传递一个打开的文件

with open('file.html') as f:
    pdfkit.from_file(f, 'out.pdf')

如果您希望进一步处理生成的PDF，您可以将它读取到一个变量

# 使用False代替输出路径将pdf,就会保存到一个变量
pdf = pdfkit.from_url('http://www.baidu.com', False)

可以指定所有wkhtmltopdf的 options 参数（这个文档中，有一部分参数的类型是没有给出的，很头疼。而且貌似也不是所有参数都可用）。可以将wkhtmltopdf的选项名中的"–"去掉。
如果有些参数没有说明值类型，您可以使用 None，False or ‘’ 为参数赋值。
对于可重复的选项(allow, cookie, custom-header, post, postfile, run-script, replace)，可以使用列表或元组。
对于需要多个值的选项(例如–custom-header)，我们可以使用元组(见下面的示例)。

options = {
    'page-size': 'Letter',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header' : [
        ('Accept-Encoding', 'gzip')
    ]
    'cookie': [
        ('cookie-name1', 'cookie-value1'),
        ('cookie-name2', 'cookie-value2'),
    ],
    'no-outline': None
}

pdfkit.from_url('http://google.com', 'out.pdf', options=options)

默认情况下，PDFKit将显示所有wkhtmltopdf生成的文件，如果不想将生成的文件显示出来，或者在无界面的服务器上因显示报错时，你可以通过设置 quiet 选项，这个就叫静默执行。

options = {
	'quiet': ''
}

pdfkit.from_url('google.com', 'out.pdf', options=options)

由于wkhtmltopdf命令的语法，TOC和Cover必须单独指定。如果你在TOC之前需要cover，使用 cover_first 选项

toc = {
    'xsl-style-sheet': 'toc.xsl'
}

cover = 'cover.html'

pdfkit.from_file('file.html', options=options, toc=toc, cover=cover)
pdfkit.from_file('file.html', options=options, toc=toc, cover=cover, cover_first=True)

当使用css选项转换文件或字符串时，您可以指定外部CSS文件。

Warning 这是wkhtmltopdf的一个BUG 。您应该首先尝试*–user-style-sheet* 选项。

# Single CSS file
css = 'example.css'
pdfkit.from_file('file.html', options=options, css=css)

# Multiple CSS files
css = ['example.css', 'example2.css']
pdfkit.from_file('file.html', options=options, css=css)

你也可以通过meta标签在你的HTML中传递任何选项：

body = """
    <html>
      <head>
        <meta name="pdfkit-page-size" content="Legal"/>
        <meta name="pdfkit-orientation" content="Landscape"/>
      </head>
      <body>
          Hello World!
      </body>
    </html>
"""

pdfkit.from_string(body, 'out.pdf') #with --page-size=Legal and --orientation=Landscape

Configuration

每个API调用都有一个可选的配置参数。这应该是’ pdfkit.configuration() ’ API调用的一个实例。它将配置选项作为初始参数。可用的选项有：

wkhtmltopdf - wkhtmltopdf二进制文件的位置。默认情况下，pdfkit将尝试使用which(在UNIX类型系统上)或where(在Windows上)来定位此文件。
meta_tag_prefix - pdfkit特定元标签的前缀-默认情况下是pdfkit-

例如：当wkhtmltopdf不在 $PATH上时，就需要如此设置：

config = pdfkit.configuration(wkhtmltopdf='/opt/bin/wkhtmltopdf')
pdfkit.from_string(html_string, output_file, configuration=config)

Troubleshooting

IOError: 'No wkhtmltopdf executable found':
确保在您的$PATH中有wkhtmltopdf，或者通过自定义配置设置wkhtmltopdf(参见 Configuration)。应该返回wkhtmltopdf在Windows中二进制文件的位置或者wkhtmltopdf在Linux中的二进制文件位置的实际路径。

IOError: 'Command Failed'
这个错误意味着PDFKit无法处理输入。您可以尝试从错误消息中直接运行命令，看看是什么错误导致了失败(在一些wkhtmltopdf版本中，这可能是由分段错误引起的)

踩坑

`中文乱码问题`

在我需要转换为 PDF 的 HTML 之中，使用了楷体。而在我服务器上的 Redhat 之中，并没有安装相关的字体，故直接转换的话，会出现乱码问题，所以需要先安装一下字体。

1 安装字体库：
yum -y install fontconfig 
不同的操作系统安装可能不同

2 创建中文字库的文件夹，依次运行下面的命令：
cd /usr/share/fonts
mkdir chinese
chmod -R 755 /usr/share/fonts/chinese
给新建的中文字体文件夹授予权限

3 使用XFTP等工具将西再好的中文字体上传到上面创建的文件夹

4 安装 ttmkfdir 来搜索目录中的字体信息： bash yum -y install ttmkfdir

5 执行 ttmkfdir 命令，汇总生成 fonts.scale 文件： bash cd /usr/share/X11/fonts/encodings ttmkfdir -e /usr/share/X11/fonts/encodings/encodings.dir

6 修改字体配置文件： bash vim /etc/fonts/fonts.conf 找到 Font directory
list，在下面增加一行 <dir>/usr/share/fonts/chinese</dir> 如下图：

按 ESC 键，输入 :wq 保存并退出。刷新内容中的字体缓存：fc-cache。到此就完成了。可以通过命令：fc-list
来查看字体列表。

`wkhtmltopdf: cannot connect to X server`

出现这个问题的原因，一个就是使用yum安装的版本太低，仍默认依赖X Server。查找了很多资料，大多全是使用root权限执行命令以安装xcfb，并配置wkhtmltopdf.

 yum install xvfb

编写一些shell脚本以将wkhtmltopdf包装在xvfb中。制作一个名为的文件wkhtmltopdf.sh

 echo -e '#!/bin/bash\nxvfb-run -a --server-args="-screen 0, 1024x768x24" /usr/bin/wkhtmltopdf -q $*' > /usr/bin/wkhtmltopdf.sh
 chmod a+x /usr/bin/wkhtmltopdf.sh
 ln -s /usr/bin/wkhtmltopdf.sh /usr/local/bin/wkhtmltopdf

配置成功后，执行指令测试

 wkhtmltopdf http://www.baidu.com output.pdf

如果依然报错，可以尝试执行

/usr/bin/wkhtmltopdf.sh http://www.baidu.com output.pdf

这使用的wkhtmltopdf.sh已将wkhtmltopdf包装在xvfb中，此时因该不会再报无法连接X Server了

`Pdfkit Command Error: wkhtmltopdf: cannot connect to X server`

使用Pdfkit生成PDF时，报这个错误，原因是Pdfkit是对wkhtmltopdf的封装，再调用wkhtmltopdf的方法时，因为wkhtmltopdf对X Server依赖的问题没有得到解决。解决办法，就是将上面生成的wkhtmltopdf.sh文件Path放到Pdfkit的Configuration中

import pdfkit
configuration = pdfkit.configuration(wkhtmltopdf='/usr/bin/wkhtmltopdf.sh')
pdf = pdfkit.from_string(self.html_code, False, options=self.options, configuration=configuration)