android 数据存储怎么保存图片_文章要保存为TXT文件，其中的图片要怎么办？Python帮你解决...-CSDN博客

前言

用 python 爬取你喜欢的 CSDN 的原创文章，保存为TXT文件，不仅查看不方便，而且还无法保存文章中的代码和图片。

今天教你制作成 PDF 慢慢看。万一作者的突然把号给删了，也会保存备份。

本篇文章视频案例教程的链接地址：

https://www.bilibili.com/video/BV1A54y1U78U/

知识点：

requests
css选择器

第三方库：

requests
parsel
pdfkit

开发环境：

版本：anaconda5.2.0(python3.6.5)
编辑器：pycharm

代码如下：

1.导入工具

import pdfkit

2.请求网站

headers = {

3.打印标签字符串

html_str = """
span style="-webkit-tap-highlight-color: transparent;box-sizing: border-box;border-width: 0px;border-style: initial;border-color: initial;">html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Documenttitle>
head>
<body>
{article}
body>
html>

4.用户信息

cookie = {

5.爬取文章数据，转化为PDF格式

def get_html(url):
    # 发送一个请求(网址)
    # 响应体
    response = requests.get(url, headers=headers, cookies=cookie)
    # text 文本(字符串)
    # 遭遇了反扒
    # print(response.text)

"""如何把 HTML 变成 PDF 格式"""
    # 提取文章部分
    sel = parsel.Selector(response.text)
    # css 选择器
    article = sel.css('article').get()
    title = sel.css('h1::text').get()
print(title)
print(article)

    html = html_str.format(article=article)
    with open(f'{title}.html', mode='w', encoding='utf-8') as f:
        f.write(html)

    # exe 文件存放的路径
config = pdfkit.configuration(wkhtmltopdf='C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe')
    # 把 html 通过 pdfkit 变成 pdf 文件
    pdfkit.from_file(f'{title}.html', f'{title}.pdf', configuration=config)


get_html('https://blog.csdn.net/nosprings/article/details/102609296')