pdfkit||利用python实现网页转PDF方法总结

最新推荐文章于 2023-11-20 20:11:10 发布

風の住む街~

最新推荐文章于 2023-11-20 20:11:10 发布

阅读量1.2k

点赞数 1

分类专栏： Python

本文链接：https://blog.csdn.net/weixin_38924500/article/details/104834515

版权

Python 专栏收录该内容

48 篇文章 5 订阅

订阅专栏

Python的第三方库pdfkit，可以将网页、html文件以及字符串生成pdf文件。

安装相关依赖库

1.python版本 3.x，在命令行输入pip install pdfkit
2.安装wkhtmltopdf.exe文件
下载地址:wkhtmltopdf
选择自己电脑的符合版本
在这里插入图片描述
下载完成后，一路next，将wkhtmltopdf安装好。务必要记住安装地址，找到wkhtmltopdf.exe文件所在的绝对路径，后面要用到。

使用pdfkit库生成pdf文件

前面说过pdfkit可以将网页、html文件、字符串生成pdf文件。

网页url生成pdf【pdfkit.from_url()函数】

# 导入库
import pdfkit

'''将网页url生成pdf文件'''
def url_to_pdf(url, to_file):
    # 将wkhtmltopdf.exe程序绝对路径传入config对象
    path_wkthmltopdf = r'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'
    config = pdfkit.configuration(wkhtmltopdf=path_wkthmltopdf)
    # 生成pdf文件，to_file为文件路径
    pdfkit.from_url(url, to_file, configuration=config)
    print('完成')

# 这里传入我知乎专栏文章url，转换为pdf
url_to_pdf(r'https://blog.csdn.net/weixin_38924500/article/details/104767891', 'name.pdf')  #第二个为文件地址及命名   ./PDF/文件名.pdf

html文件生成pdf【pdfkit.from_file()函数】

# 导入库
import pdfkit

'''将html文件生成pdf文件'''
def html_to_pdf(html, to_file):
    # 将wkhtmltopdf.exe程序绝对路径传入config对象
    path_wkthmltopdf = r'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'
    config = pdfkit.configuration(wkhtmltopdf=path_wkthmltopdf)
    # 生成pdf文件，to_file为文件路径
    pdfkit.from_file(html, to_file, configuration=config)
    print('完成')

html_to_pdf('sample.html','name.pdf')

字符串生成pdf【pdfkit.from_string()函数】

# 导入库
import pdfkit

'''将字符串生成pdf文件'''
def str_to_pdf(string, to_file):
    # 将wkhtmltopdf.exe程序绝对路径传入config对象
    path_wkthmltopdf = r'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'
    config = pdfkit.configuration(wkhtmltopdf=path_wkthmltopdf)
    # 生成pdf文件，to_file为文件路径
    pdfkit.from_string(string, to_file, configuration=config)
    print('完成')

str_to_pdf('This is test!','out_3.pdf')

在项目中灵活应用pdfkit

url = " www.baidu.com"
title= "mypdf"
# print(url)
file_name = './PDF/'+title+'.pdf'  #定义相对路径存储文件，注意使用正斜杠，否则会出现转义问题。
# print(file_name)
myconfig = pdfkit.configuration(wkhtmltopdf='C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe')
#对打印的pdf进行设置，具体参数根据需要添加
options = {
			'page-size': 'A4',
			'margin-top': '0mm',
			'margin-right': '0mm',
			'margin-bottom': '0mm',
			'margin-left': '0mm',
			# 'orientation':'Landscape',#横向
			'encoding': "UTF-8",
			'no-outline': None,
			'footer-right':'[page]' #设置页码
}

pdfkit.from_url(url, file_name, options=options, configuration=myconfig)

多个请求造成浏览器堵塞问题

我们经常会需要将多个链接地址放进去解析pdf,但是需要一定的时间，如果上一个链接的请求打印没有结束，然后程序又开始请求下一条链接，就会造成问题。
所以我们可以使用

import time
try:
   #请求地址、解析程序
   ,,,,
excep：
     print('-------------xxxxx-----------')
                time.sleep(20)  #让进程睡眠一会，防止出现请求堵塞

風の住む街~

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录