Python爬取必应壁纸

最新推荐文章于 2024-07-02 10:47:55 发布

林⁣熙

最新推荐文章于 2024-07-02 10:47:55 发布

阅读量501

点赞数 1

分类专栏： python编程文章标签： python 爬虫

原文链接：https://blog.csdn.net/weixin_42577686/article/details/100126405

版权

python编程专栏收录该内容

8 篇文章 2 订阅

订阅专栏

需要用的模块

requests:requests是使用Apache2 licensed 许可证的HTTP库。用python编写。比urllib2模块更简洁。
Request支持HTTP连接保持和连接池，支持使用cookie保持会话，支持文件上传，支持自动响应内容的编码，支持国际化的URL和POST数据自动编码。
在python内置模块的基础上进行了高度的封装，从而使得python进行网络请求时，变得人性化，使用Requests可以轻而易举的完成浏览器可有的任何操作。
现代，国际化，友好。
requests会自动实现持久连接keep-alive
lxml：python用来解析xml和html模块，用这个模块就可以使用xpath语法。xpath就是用来筛选html或者xml中元素语法。如果匹配标签和元素，则返回element对象，如果匹配到的是标签和text，则返回字符串。


import requests
from lxml import etree
import os
 
#必应图片网页地址 https://bing.ioliu.cn/?p=3
url = 'https://bing.ioliu.cn/'
#浏览器参数
header = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
    #referer的作用就是记录你在访问一个目标网站时，在访问前你的原网站的地址
	'Referer':'http://bing.ioliu.cn'
}
#图片的张数
number = 0

 #下载网页
html = requests.get(url,headers=header).text 
   
#构造xpath的解析对象
etree_html = etree.HTML(html)      
             
#获取图片地址
# // | 文档的任意位置 
# @属性名 | 选取属性名所对应的方法 
img_url = etree_html.xpath('//img/@src')
     
#判断是否存在文件夹picture，不存在则创建一个
if not os.path.exists('picture'):
	os.mkdir('picture')   


#下载图片并保存至指定位置
for img_list in img_url:  
	#replace()方法：str.replace(old, new[, max])
	#参数
	#old -- 将被替换的子字符串。
	#new -- 新字符串，用于替换old子字符串。
	#max -- 可选字符串, 替换不超过 max 次
	#替换图片清晰度
    img_list = img_list.replace('640x480','1920x1080')
    #print(img_list)
	#获取图片内容
    img = requests.get(img_list,headers=header).content
    number+=1
    print('正在下载第{}张图片'.format(number))
	
    img_name = 'picture\\{}.jpg'.format(number)
    with open(img_name,'wb') as save_img:
		#写入图片数据
        save_img.write(img)