[网络爬虫技术系列]一个适合新手练习的网络图片批量下载代码示例

最新推荐文章于 2021-05-29 20:05:19 发布

baiyan_ben1900

最新推荐文章于 2021-05-29 20:05:19 发布

阅读量193

点赞数

分类专栏：数据分析文章标签： python 大数据爬虫

本文链接：https://blog.csdn.net/baiyan83/article/details/110676132

版权

数据分析专栏收录该内容

4 篇文章 0 订阅

订阅专栏

一个适合新手的简单网络爬虫练习

requests 模块
BeautifulSoup 模块
实现代码

对于现在越来越多的网络数据，我们需要一种能找到并获取我们关注的数据方法。网络爬虫技术提供了一个好的思路。python有特别适合大数据的处理已经网路数据的获取及处理，提供了一个好的工具。
下面是一个简单明了的网络图片批量爬取下载的例子，可以直接运行。已经经过测试验证，方便理清思路进行练习。

requests 模块

主要功能是完成对网络数据的获取。此处，也可以使用python自带的urllib模块完成类似功能。但是代码不够简洁，使用比requests繁琐。所以，requests模块更为好用。

可用以下命令进行安装：

pip install requests

BeautifulSoup 模块

主要来完成类似于数据匹配功能。此处也可以使用python自带的re模块来完成正则匹配功能。但是，同样代码不够简洁，使用比BeautifulSoup繁琐。所以，BeautifulSoup此处更为好用。
可用以下命令进行安装：

pip install bs4

实现代码

from bs4 import BeautifulSoup  # 比re模块更加易用，用来取代正则表达式功能
import requests  # 比 urllib中的request更加易用
import os
import shutil

headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Encoding": "gzip, deflate, sdch",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Connection": "close",
    "Cookie": "_gauges_unique_hour=1; _gauges_unique_day=1; _gauges_unique_month=1; _gauges_unique_year=1; _gauges_unique=1;",
    "Referer": "http://www.infoq.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}

url = 'https://xclient.info/'


def download_jpg(image_url, image_localpath):
    response = requests.get(image_url, stream=True)
    if response.status_code == 200:
        with open(image_localpath, 'wb') as f:
            response.raw.deconde_content = True
            shutil.copyfileobj(response.raw, f)


# 取得图片
def craw3(url):
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    # print(soup.prettify())
    for pic_href in soup.find_all('div', class_='index-main row'):
        for pic in pic_href.find_all('img'):
            imgurl = pic.get('src')
            if imgurl is not None:
                dir = os.path.abspath('./img/')
                if os.path.exists(dir):
                    pass
                else:
                    os.mkdir(dir)
                filename = os.path.basename(imgurl)
                imgpath = os.path.join(dir, filename)
                print('开始下载: imgurl is %s, imgpath is %s' % (imgurl, imgpath))
                download_jpg(imgurl, imgpath)


if __name__ == '__main__':
    craw3(url)

参考资料：
CrawlerDemo
写一个简单的python爬虫

baiyan_ben1900

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[网络爬虫技术系列]一个适合新手练习的网络图片批量下载代码示例

一个适合新手的简单网络爬虫练习requests 模块BeautifulSoup 模块对于现在越来越多的网络数据，我们需要一种能找到并获取我们关注的数据方法。网络爬虫技术提供了一个好的思路。python有特别适合大数据的处理已经网路数据的获取及处理，提供了一个好的工具。下面是一个简单明了的网络图片批量爬取下载的例子，可以直接运行。已经经过测试验证，方便理清思路进行练习。requests 模块主要功能是完成对网络数据的获取。此处，也可以使用python自带的urllib模块完成类似功能。但是代码不够简
复制链接

扫一扫

专栏目录