使用pdfkit生成网页的pdf 异常汇总

最新推荐文章于 2023-07-22 10:30:36 发布

tunan666

最新推荐文章于 2023-07-22 10:30:36 发布

阅读量1.6k

点赞数

分类专栏： Python 文章标签： python xpdf vue.js html5 java

本文链接：https://blog.csdn.net/tunan666/article/details/111874855

版权

Python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

使用pdfkit生成pdf

1. 采用Selenium、ChormeDriver和pdfkit生成网页的pdf

2. 使用pdfkit生成网页的pdf 异常汇总（本文）

1、OSError: wkhtmltopdf reported an error

代码示例：

import pdfkit, time, pprint
from selenium import webdriver

options_chrome = webdriver.ChromeOptions()
# 以最高权限运行
options_chrome.add_argument('--no-sandbox')
# 浏览器不提供可视化页面，linux下如果系统不支持可视化不加这条会启动失败
options_chrome.add_argument('--headless')
# executable_path为chromedriver的位置
driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', chrome_options=options_chrome)
# 浏览器全屏
driver.fullscreen_window()

url = 'http://www.tn666.com?type=1'
driver.get(url)
# sleep 1秒
time.sleep(1)
source_text = driver.page_source

options_pdf = {
    'page-size': 'A4'
}
result = pdfkit.from_string(source_text, 'test.pdf', options=options_pdf)

driver.quit()

详细报错信息：

Traceback (most recent call last):  File "pdfkit_selenium_test.py", line 25, in <module>    result = pdfkit.from_string(source_text, '/home/tn/code/python/test.pdf', options=options_pdf)  File "/home/tn/anaconda3/envs/python3.6.6/lib/python3.6/site-packages/pdfkit/api.py", line 72, in from_string    return r.to_pdf(output_path)  File "/home/tn/anaconda3/envs/python3.6.6/lib/python3.6/site-packages/pdfkit/pdfkit.py", line 156, in to_pdf    raise IOError('wkhtmltopdf reported an error:\n' + stderr)OSError: wkhtmltopdf reported an error:

报错原因：引用的外部资源加载不到

解决：获取页面源码后，外部资源增加host域名，使之可以加载，然后再生成pdf，比如：

source_text = source_text.replace('/static/js/index.js', 'http://www.tn666.com/static/js/index.js')

2、pdf生成异常：unknown error: DevToolsActivePort file doesn't exist

详细报错信息：

(unknown error: DevToolsActivePort file doesn't exist)(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

现象：

ps aux | grep google-chrome有几十个进程

临时解决办法：

google-chrome进程只剩一个，其他的全部杀掉后，运行程序没有问题了

Q&A：

每次运行程序都启动一个新进程，如何脚本结束后，自动把进程杀掉？

解决办法：

代码中用的是 driver.quit()，先说一下driver.quit()和driver.close()的区别：

driver.close()：只会关闭当前页面

driver.quit()：会退出驱动并且关闭所关联的所有窗口

通过selenium.webdriver.chrome.service中的Service来控制ChromeDriver进程的生死

代码示例：

import pdfkit, time, pprint
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service_driver = Service('/usr/local/bin/chromedriver')
service_driver.command_line_args()
service_driver.start()

options_chrome = webdriver.ChromeOptions()
# 以最高权限运行
options_chrome.add_argument('--no-sandbox')
# 浏览器不提供可视化页面，linux下如果系统不支持可视化不加这条会启动失败
options_chrome.add_argument('--headless')

driver = webdriver.Chrome(chrome_options=options_chrome)
# 浏览器全屏
driver.fullscreen_window()

url = 'http://www.tn666.com?type=1'
driver.get(url)
# sleep 1秒
time.sleep(1)
source_text = driver.page_source

options_pdf = {
    'page-size': 'A4'
}
result = pdfkit.from_string(source_text, 'test.pdf', options=options_pdf)

driver.quit()
service_driver.stop()

3、页面生成空白，带loading状态

现象：生成的pdf为带loading状态的空白页

原因：页面未加载出来时，生成了pdf

解决：优化页面加载速度，或者driver.get后，time.sleep加长，再去获取源码driver.page_source

4、页面生成空白，完全空白页

现象：生成的pdf为空白页，把driver.page_source打印出来，再生成html页面，也是空白页

原因：其中的某些js可能会重置页面，把页面重置为空了

解决：逐个尝试把js替换为空，看看页面是否会加载出来

请将代码中的url换为您想转为pdf的url

更多内容，请扫码关注公众号～

tunan666

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用pdfkit生成网页的pdf 异常汇总

使用pdfkit生成pdf1. 采用Selenium、ChormeDriver和pdfkit生成网页的pdf2.使用pdfkit生成网页的pdf 异常汇总（本文）1、OSError: wkhtmltopdf reported an error代码示例：import pdfkit, time, pprintfrom selenium import webdriveroptions_chrome = webdriver.ChromeOptions()# 以最高权限运行opti..
复制链接

扫一扫