Python 爬取无版权美图

33 篇文章 0 订阅
16 篇文章 0 订阅
本文通过Python的requests库演示了如何爬取无版权的美图资源,介绍了爬虫的基本步骤,并预告将使用scrapy框架进行进一步的讲解。
摘要由CSDN通过智能技术生成

这里还是为了温习,这里照例为了放水,涉及网址的地方,采取了url加密,一是scrapy,二是requests,首先是requests方法:

import requests
import re
import os
import base64
from lxml import etree
from urllib.parse import urljoin


def get_text(url):
    headers={'User-Agent':
                 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
             'Referrer':b'aHR0cHM6Ly9waXhhYmF5LmNvbS96aC8='}
    try:
        response = requests.get(url, headers=headers, timeout=30)
        response.encoding = 'utf-8'
        return response.text
    except requests.RequestException as err:
        print(err)
        return ''


def get_urls(url_template, length):
    return [url_template.format(i) for i in range(1,length+1)]


def img_download(img):
    file_path = os.getcwd()
    directory = os.path.join(file_path, '\\Pixabay\\Wallpaper')
    os.chdir(directory)
    name = img.split('/')[-1]
    try:
        with open(name, 'wb') as Image:
            Image.write(requests.get(img).content)
    except (requests.RequestException, FileExistsError) as err:
        print(err)
        pass


def clean_str(string):
    return string.replace('340', '1280').replace('480', '1280').replace('__','_')


url_template = b'aHR0cHM6Ly9waXhhYmF5LmNvbS96aC9pbWFnZXMvc2VhcmNoLyVFNSVBMyU4MSVFNyVCQSVCOC8/cGFnaT17fQ=='
pattern1 = b'aHR0cHM6Ly9jZG5cLnBpeGFiYXlcLmNvbS9waG90by9cZCsvXGQrL1xkKy9cZCsvXGQrL1x3K1stXHcrXSstXGQrX19cZCtcLmpwZw=='
pattern2 = b'aHR0cHM6Ly9jZG5cLnBpeGFiYXlcLmNvbS9waG90by9cZCsvXGQrL1xkKy9cZCsvXGQrL1x3K1stXHcrXSstXGQrX1xkK1wuanBn'
length = 1
urls = get_urls(url_template, length)
for url in urls:
    resp = get_text(url)
    imgs= re.findall(pattern1, resp)
    imgs = list(set(list(map(clean_str, imgs))))
    
    for img in imgs:
        img_download(img)

基本就完工了,之后再写scrapy

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值