下载原力创付费文档—全屏阅览式
一、项目需求:
从目标网址下载付费文档,并保存为word形式
网址点这里
二、思路
- 1.利用selenium实现异步加载,获取图片url
- 2.爬取图片
- 3.将图片写进word文档
三、技术点
- 1.python + selenium自动化
- 2.python + docx
四、环境
python3.6 + selenium + docx
安装(推荐使用清华源):
pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install python-docx -i https://pypi.tuna.tsinghua.edu.cn/simple/
五、代码
import time
from selenium import webdriver
from selenium.webdriver.common import keys
import requests
from docx import Document
from docx.shared import Inches
class YuanLC:
def __init__(self, url, filename):
# 创建session网络请求对象
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,a"
"pplication/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Cookie": "CLIENT_SYS_UN_ID=3rvgCl