前言
随笔记,目标网址为https://www.qqxs.cc/fenlei1/1/,下载小说封面
1.正常使用面向对象方法
确定url ==>获取响应==>xpath抽取响应内容中的图片地址==>遍历图片地址,保存至文件中即可
使用retrying模块,可使获取响应的时候超时自动重试,
""
难点在于获取响应之前要先判断你要的文件是否已经存在了,学会retry的用法,
多线程千万要注意变量会共享,尽量不要初始化定义共有变量
"""
import requests
import json
import os, os.path
import time
from lxml.html import etree
from retrying import retry
class ImageDownload:
def __init__(self, pages=2):
"""
:param pages: 指定小说下载页数
"""
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36"
}
self.start_url = "https://www.qqxs.cc/fenlei1/1/"
self.start_time = time.time()
self.pages = pages
# self.num=1
@retry(stop_max_attempt_number=3) # retry 设置获取响应最大尝试次数
def get_response(self, url):
"获得未解码的响应"
response = requests.get(url, headers=self.headers, timeout=5)
assert response.status_code == 200 响应状态码不是200则retry
return response
def parse_response(self, url):
"解析响应,返回图片urls"
html_response = etree.HTML(self.get_response(url).content.decode('gbk'))
cover_urls = html_response.xpath("//div[@id='al