Python爬虫爬取src图片

最新推荐文章于 2023-08-03 16:12:05 发布

Reader_a

最新推荐文章于 2023-08-03 16:12:05 发布

阅读量3.6k

点赞数 3

分类专栏： Python 文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/weixin_54250368/article/details/122192440

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Python爬虫爬取图片

需要用到的库：
os
time
request
lxml

代码源码如下：

import os
import time
import requests
from lxml import etree
#建议headers写全 也可以只写user-agent和cookie
headers = {
'User-Agent':'xxxxxx',
'Accept': 'xxxxxx',
'Accept-Encoding': 'xxxxxx',
'Accept-Language': 'xxxxxx',
'Connection': 'xxxxxx',
'Cookie': 'xxxxxx',
'Referer': 'xxxxxx'
}
#单页面爬取
#先请求获得想要爬取图片的当前网页源码
#可以使用print打印到控制台分析也可以在网页中使用F12查看分析
response = requests.get('http://xxxxxx.com/xxx?xxx',headers=headers)

# print(response.content.decode())
html = etree.HTML(response.content.decode())
#使用xpath获取图片源
#img标签下的
imgs = html.xpath("//img//@file")
for i in imgs:
	#睡眠3秒
    time.sleep(3)
    #可能需要进行链接拼接
    url2 = ("http://xxxxxx.com/"+i)
    print(url2)
    #保存图片名
    #以'/'做分隔符 截取倒数第一个作为图片名
    file_name = url2.split('/')[-1]
    print(file_name)
    #请求图片源
    #content中间存的是字节码 保存图片使用content
    img_data = requests.get(url2, headers=headers).content
    #创建文件夹
    if not os.path.exists('./xx'):
        os.mkdir('./xx')
    img_path = './xx/' + file_name
    with open(img_path, 'wb') as f:
        f.write(img_data)

xpath 选择
在这里插入图片描述

Reader_a

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫爬取src图片

Python爬虫爬取图片需要用到的库：ostimerequestlxml代码源码如下：import osimport timeimport requestsfrom lxml import etree#建议headers写全也可以只写user-agent和cookieheaders = {'User-Agent':'xxxxxx','Accept': 'xxxxxx','Accept-Encoding': 'xxxxxx','Accept-Language': 'xxxxx
复制链接

扫一扫