爬虫—对于加密数据进行爬取

最新推荐文章于 2024-04-30 21:07:13 发布

HQ_JSY

最新推荐文章于 2024-04-30 21:07:13 发布

阅读量4.7k

点赞数 3

分类专栏：爬虫

本文链接：https://blog.csdn.net/JSYhq/article/details/88712654

版权

一、案例

1. 对一个新的网站进行爬取之前，首先要确定即将要进行爬取的数据是否为动态加载！

#需求：爬取煎蛋网的图片数据  http://jandan.net/ooxx
import requests
from lxml import etree
import base64
from urllib import request


headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
}
url = 'http://jandan.net/ooxx/page-62#comments'
page_text = requests.get(url=url,headers=headers).text

#解析图片的密文
tree = etree.HTML(page_text)
code_list = tree.xpath('//span[@class="img-hash"]/text()')
for code in code_list:
    # 基本上是,常用的页面加密算法
    img_url ='http:' + base64.b64decode(code).decode()
    imgName = img_url.split('/')[-1]
    request.urlretrieve(img_url,imgName)
    print(imgName,'下载成功！！！')

二、整合匹配原则

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

HQ_JSY

关注关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
2
评论
爬虫—对于加密数据进行爬取

一、案例1.对一个新的网站进行爬取之前，首先要确定即将要进行爬取的数据是否为动态加载！#需求：爬取煎蛋网的图片数据 http://jandan.net/ooxximport requestsfrom lxml import etreeimport base64from urllib import requestheaders = { 'User-Agent':'...
复制链接

扫一扫