爬虫实践(2)下载图片

本文介绍了两种Python爬虫下载图片的方法,包括使用requests库从百度图片搜索获取壁纸并保存,以及通过PIL库处理图片并保存。此外,还提到了多线程下载的优化思路,适合初学者了解爬虫与图片处理的基本操作。
摘要由CSDN通过智能技术生成

1. code

方式1:

import requests 

import re
from time import sleep


class MyImage():
    def __init__(self, keyword):
        headers = {"User-Agent":"Python-urllib/2.6"}
        self.session = requests.Session()
        self.session.headers.update(headers)
        self.keyword = keyword
    
    def get_all_image_info(self):
        images = []
        url = "https://image.baidu.com/search/index"
        params = {"word":self.keyword,
                  "tn":"baiduimage"}
        res = self.session.get(url, params=params)
        assert res.status_code == 200
        image_urls = re.findall("thumbURL\":\"(.*?).jpg\"", res.text)
        image_names = re.findall("\"fromPageTitle\":\"(.*?)\"", res.text)
        for index, ele in enumerate(image_names, 0):
            move = dict.fromkeys((ord(c) for c in u"0123456789【】!@#$%^&*()[]{};:,./<>?\|`~-=_+strongxp "))
            name = ele.translate(move).strip()
            temp_dict = {"desc":name, "url":image_urls[index] + ".jpg"}
            images.append(temp_dict)
        return images
    
    def download_image(self, image_name, image_url):
        res = self.session.get(image_url)
        data = res.content
        with open(f"{image_name}.jpg","wb") as file_object:
            # 写入数据
            file_object.write(data)
            # 缓一缓
            sleep(0.5)
    
    def download_all_image(self):
        images = self.get_all_image_info()
        for item in images:
            self.download_image(item["desc"], item["url"])
        
if __name__=="__main__":
    m = MyImage("壁纸")
    m.download_all_image()
    

在这里插入图片描述
补充


#正则表达式匹配图片
import re
import requests

url = "https://www.baidu.com/"
res = requests.get(url)
res.encoding="utf8"   #根据返回的html编码设置:charset=utf-8
html_str = res.text

pat = '<img.*src=(.*?) width'
result = re.findall(pat, html_str)  #['//www.baidu.com/img/bd_logo1.png']
image_url = f'https:{result[0]}'   #https://www.baidu.com/img/bd_logo1.png

#下载图片
res = requests.get(image_url)
print(res.text)
with open("baidu_logo.png","wb") as fp:     #wb:二进制打开
    fp.write(res.content)       #res.content字节, res.text字符串

方式2:

import requests
from PIL import Image
from io import BytesIO
import os

url = "https://www.baidu.com/img/PCtm_d9c8750bed0b3c7d089fa7d55720d6cf.png"

res = requests.get(url ,verify=False)
i = Image.open(BytesIO(res.content))
i.save(os.path.join(r'C:\Users\SDS\eclipse-workspace\shujufenxi\src\n', 'image.png'), quality=85)

2. 多线程下载(待补充)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值