最近在菜鸟教程上面发现的程序员的壁纸，利用python网络爬虫全部收入囊中！

最新推荐文章于 2021-11-18 16:12:36 发布

喧啸

最新推荐文章于 2021-11-18 16:12:36 发布

阅读量405

点赞数

分类专栏： python网络爬虫学习历程笔记文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_42372829/article/details/108207356

版权

学习历程同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

笔记

3 篇文章 0 订阅

订阅专栏

python网络爬虫

2 篇文章 0 订阅

订阅专栏

大家同作为程序员或者即将走在程序员的路上，为了可以证明自己的身份，这些壁纸会派上用场，快来选择一张自己喜欢的或者适合自己的作为你电脑的桌面吧!

先来欣赏一下已经爬取下来的部分壁纸~~~~
哇，一张带有动漫风格的源程序壁纸~~~~~~
讲道理，我也不知道如何描述这张意境深远的壁纸
没错，是我们程序员的做法

既然随便拿出三张都这么好看，剩下的我也全要了，给它爬取下来。

在这里插入图片描述

本程序采用xpath解析方式来解析网页，提取信息，使用requests库来请求网页内容，除此之外，还使用了os库来在根目录下面创建文件夹保存下载的壁纸。

程序使用了模块化设计思想，分为获取模块和存储模块，通过进行requests模块来获取页面的全部信息，同时还进行了异常处理，在连接失败的时候进行异常处理，本模块较为简单，也是网络爬虫中最基础的请求方法。

def get_message_page():
    global headers
    headers ={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
    }
    url = 'https://www.runoob.com/w3cnote/17-wallpaper-for-programmer.html'
    response = requests.get(url = url ,headers = headers )
    try:
        if response.status_code == 200:
            print('请求成功!!')
            response.encoding = response.apparent_encoding
            return response.text
    except requests.ConnectionError:
        print('连接失败!!')
        return None

存储模块使用了xpath解析网页内容并提取内容，将获取的图片链接保存在一个列表中，之前便利列表，将二进制图片进行保存。

def get_img_page(html):
    tree = etree.HTML(html)
    #title_list = tree.xpath('/html/body/div[3]/div/div[1]/div/div[2]/div/h3/text()')
    img_list = tree.xpath('/html/body/div[3]/div/div[1]/div/div[2]/div/p/a/@href')
    #print(title_list)
    #print(img_list)
    image_list = []
    for img in img_list:
        image = 'https:' + img
        image_list.append(image)
    print(image_list)
    if not os.path.exists('./image_data'):
        os.mkdir('./image_data')

    for number,jlist in enumerate(image_list):
        response = requests.get(url = jlist,headers= headers)
        img_path = './image_data/' + str(number) + '.jpg'
        with open(img_path,'wb') as f:
            f.write(response.content)
        print(jlist,'保存成功！！')

运行程序，我们发现所有的壁纸都保存在了程序根目录下的一个文件夹中，下面将全部代码奉上。

"""
最近在菜鸟教程上面发现了适合程序员用的壁纸
小学生才做选择呢，我全要
制作一个爬虫小程序来完成这个贪婪的想法
同时也分享给大家
"""
import requests
from lxml import etree
import os

def get_message_page():
    global headers
    headers ={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
    }
    url = 'https://www.runoob.com/w3cnote/17-wallpaper-for-programmer.html'
    response = requests.get(url = url ,headers = headers )
    try:
        if response.status_code == 200:
            print('请求成功!!')
            response.encoding = response.apparent_encoding
            return response.text
    except requests.ConnectionError:
        print('连接失败!!')
        return None

def get_img_page(html):
    tree = etree.HTML(html)
    #title_list = tree.xpath('/html/body/div[3]/div/div[1]/div/div[2]/div/h3/text()')
    img_list = tree.xpath('/html/body/div[3]/div/div[1]/div/div[2]/div/p/a/@href')
    #print(title_list)
    #print(img_list)
    image_list = []
    for img in img_list:
        image = 'https:' + img
        image_list.append(image)
    print(image_list)
    if not os.path.exists('./image_data'):
        os.mkdir('./image_data')

    for number,jlist in enumerate(image_list):
        response = requests.get(url = jlist,headers= headers)
        img_path = './image_data/' + str(number) + '.jpg'
        with open(img_path,'wb') as f:
            f.write(response.content)
        print(jlist,'保存成功！！')


def main():
    html = get_message_page()
    get_img_page(html)

if __name__ == '__main__':
    main()

至此，我们的壁纸都怕取下来并且进行了保存，想要的小伙伴可以留言或者关注博主公众号 Crawler 乐趣 进行获取，欢迎大家进行技术交流。

喧啸

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
最近在菜鸟教程上面发现的程序员的壁纸，利用python网络爬虫全部收入囊中！

大家同作为程序员或者即将走在程序员的路上，为了可以证明自己的身份，这些壁纸会派上用场，快来选择一张自己喜欢的或者适合自己的作为你电脑的桌面吧!先来欣赏一下已经爬取下来的部分壁纸~~~~哇，一张带有动漫风格的源程序壁纸~~~~~~讲道理，我也不知道如何描述这张意境深远的壁纸没错，是我们程序员的做法既然随便拿出三张都这么好看，剩下的我也全要了，给它爬取下来。本程序采用xpath解析方式来解析网页，提取信息，使用requests库来请求网页内容，除此之外，还使用了os库来在根目录下面创建文件夹保
复制链接

扫一扫