网络学习爬虫心得3（爬取图片）

最新推荐文章于 2022-06-18 03:08:21 发布

firewolf0

最新推荐文章于 2022-06-18 03:08:21 发布

阅读量515

点赞数

本文链接：https://blog.csdn.net/firewolf0/article/details/105576960

版权

经常使用电脑，桌面的壁纸清晰好看养眼，也能带来一天的好心情。本次使用彼岸图网http://pic.netbian.com/4kfengjing/index.html，爬取4K风景图来作为桌面。练手的话就只需要爬取前10页的图片就行。

图片的爬取和文本内容的爬取基本上是一样的，关键是在找到图片的下载地址。

http://pic.netbian.com/4kfengjing/index.html用审查元素功能，要爬取前10页，可以在底部页码选择上查看源代码，发现每一页都很有规律，第二页就是前面网址最后部分修改为index_2.html，就只有第一页index.html，没有_1。

link = 'http://pic.netbian.com/4kfengjing/'
link_add = []
link_add.append(link)
for i in range(2,11):
    link_add.append(link+'index_'+str(i)+'.html')

分析每一页的4K风景图，发现图片都在

标签里面，分布在

标签下，用一条代码就能找到图片的地址。

pic_list = soup.find('div',class_='slist').find_all('img')
for pic in pic_list:
    pic_url = 'http://pic.netbian.com'+pic['src']

图片需要命名，直接截取图片地址进行命名。
整体的代码如下：

import requests
from bs4 import BeautifulSoup

header = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
link = 'http://pic.netbian.com/4kfengjing/'
link_add = []
link_add.append(link)
for i in range(2,11):
    link_add.append(link+'index_'+str(i)+'.html')

for n in range(10):
    r = requests.get(link_add[n],headers=header,timeout=10)
    soup = BeautifulSoup(r.text,'lxml')
    pic_list = soup.find('div',class_='slist').find_all('img')
    for pic in pic_list:
        pic_url = 'http://pic.netbian.com'+pic['src']
        pic_name = pic['src'][-18:]
        picture = requests.get(pic_url).content
        with open('d:\\picture\\'+pic_name,'wb') as f:
            f.write(picture)
            f.close()

firewolf0

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
网络学习爬虫心得3（爬取图片）

经常使用电脑，桌面的壁纸清晰好看养眼，也能带来一天的好心情。本次使用彼岸图网http://pic.netbian.com/4kfengjing/index.html，爬取4K风景图来作为桌面。练手的话就只需要爬取前10页的图片就行。图片的爬取和文本内容的爬取基本上是一样的，关键是在找到图片的下载地址。http://pic.netbian.com/4kfengjing/index.html用审查...
复制链接

扫一扫