【python实现网络爬虫（13）】python爬取全景网图片

最新推荐文章于 2024-09-12 18:31:52 发布

lys_828

最新推荐文章于 2024-09-12 18:31:52 发布

阅读量1.2k

点赞数 3

分类专栏： # python网络爬虫文章标签： python 大数据网络爬虫全景网图片爬取

本文链接：https://blog.csdn.net/lys_828/article/details/104905571

版权

python网络爬虫专栏收录该内容

25 篇文章 42 订阅

订阅专栏

目标网址：全景网山水壁纸，页面如下
在这里插入图片描述

1. 网页分析

还是和之前爬取文字信息一致，需要进行网页信息的解析，获得图片数据所在的地址，然后进行图片的下载

分析网页后发现所需要的图片的url在【a.item.lazy img】中
在这里插入图片描述

2. 封装第一个函数，获取图片的url

首先是导入相关的库，然后进行函数的编写，这里的第一个函数的内容和之前的几乎一样

import requests
from bs4 import BeautifulSoup
from uuid import  uuid1
import os

def get_image():
	url = 'https://www.quanjing.com/creative/topic/1'
	html = requests.get(url)
	# print(html)
	soup = BeautifulSoup(html.text,'lxml')
	# print(soup)
	images = soup.select('a.item.lazy img')
	# print(images)
	for img in images[:10]:
		print(img['src'])
		img_url = img['src']

–> 输出结果为：（这里只展示部分内容，每个网址都对应一张图片，在浏览器上输入网址即可下载）

http://mpic.tiankong.com/34f/7d3/34f7d3b8ff1da0cc98cfbf1e2969ba25/640.jpg@!240h
http://mpic.tiankong.com/0c3/116/0c3116e0c6bda3d67a18b34f8659bd5d/640.jpg@!240h
http://mpic.tiankong.com/2bf/756/2bf756b76df89186a44f0709dfe8d8bd/640.jpg@!240h
http://mpic.tiankong.com/fff/a07/fffa07d7409dfcad0ca9f996f42a9112/640.jpg@!240h
http://mpic.tiankong.com/e91/d37/e91d3759a45d9ebb70f745fb489f7ae2/640.jpg@!240h
http://mpic.tiankong.com/deb/c7c/debc7cdc05fd3a72b61779a219e66568/640.jpg@!240h
http://mpic.tiankong.com/ae9/6d3/ae96d301c743b06a291bb5efbc629cf6/640.jpg@!240h
http://mpic.tiankong.com/207/f03/207f03e0d631bbe7094ccacf63528715/640.jpg@!240h
http://mpic.tiankong.com/28d/939/28d939ff9556f07f13c6e33c0c52d4d5/640.jpg@!240h
http://mpic.tiankong.com/62f/901/62f9011a11df31a9be68729a67be020e/640.jpg@!240h

3. 封装第二个函数，进行图片的下载

首先要下载图片必须要有存放地址，所以要先创建文件夹用来保存要下载的文件，也就要用到了创建文件夹的常见搭配；

其次就是进行图片内容的写入，这里都是二进制的内容（图片、音频和视频都是），采用的是.content方法获得相应的信息

def download(url):
	if not os.path.exists('./picture'):
		os.makedirs('picture')
	with open('./picture/{}.jpg'.format(uuid1()),'wb') as f:
		f.write(requests.get(url).content)

最后在封装的第一个函数中调用这个函数

download(img_url)

–> 输出结果为：（uuid模块的功能就是创建不相同的随机数，用来给照片命名）
在这里插入图片描述

4. 拓展

如果不习惯使用这种很长的随机值来进行图片的文件进行命名，可以尝试之前用过的datetime模块，获得当前的日期，再配合着计数的方式给文件夹及图片文件进行命令

主要是对函数二来进行修改，最后全部的代码如下

import requests
from bs4 import BeautifulSoup
from datetime import date
import os

def get_image():
	
	url = 'https://www.quanjing.com/creative/topic/1'
	html = requests.get(url)
	# print(html)
	soup = BeautifulSoup(html.text,'lxml')
	# print(soup)
	images = soup.select('a.item.lazy img')
	# print(images)
	count = 1 
	for img in images[:50]:
		#print(img['src'])
		img_url = img['src']
		download(img_url,count)
		print(f'正在下载第{count}张图片......')
		count += 1

def download(url,count):
	today = str(date.today())
	if not os.path.exists(f'./{today}_pic'):
		os.makedirs(f'{today}_pic')
	with open('./{}_pic/{}.jpg'.format(today,count),'wb') as f:
		f.write(requests.get(url).content)

get_image()