05_运行百度爬取图片

最新推荐文章于 2021-08-13 03:59:32 发布

kinbridge

最新推荐文章于 2021-08-13 03:59:32 发布

阅读量167

点赞数

分类专栏： Python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/kinbridge/article/details/100067648

版权

Python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

测试代码：

import requests #首先导入库

import re

#设置默认配置

MaxSearchPage = 20 # 收索页数

CurrentPage = 0 # 当前正在搜索的页数

DefaultPath = "pictures" # 默认储存位置

NeedSave = 0 # 是否需要储存

#图片链接正则和下一页的链接正则

def imageFiler(content): # 通过正则获取当前页面的图片地址数组

return re.findall('"objURL":"(.*?)"',content,re.S)

def nextSource(content): # 通过正则获取下一页的网址

next = re.findall('<div id="page">.*<a href="(.*?)" class="n">',content,re.S)[0]

print("---------" + "http://image.baidu.com" + next)

return next

#爬虫主体

def spidler(source):

content = requests.get(source).text # 通过链接获取内容

imageArr = imageFiler(content) # 获取图片数组

global CurrentPage

print("Current page:" + str(CurrentPage) + "**********************************")

for imageUrl in imageArr:

print(imageUrl)

global NeedSave

if NeedSave: # 如果需要保存图片则下载图片，否则不下载图片

global DefaultPath

try:

# 下载图片并设置超时时间,如果图片地址错误就不继续等待了

picture = requests.get(imageUrl,timeout=10)

except:

print("Download image error! errorUrl:" + imageUrl)

continue

# 创建图片保存的路径

imageUrl = imageUrl.replace('/','').replace(':','').replace('?','')

pictureSavePath = DefaultPath + imageUrl

fp = open(pictureSavePath,'wb') # 以写入二进制的方式打开文件

fp.write(picture.content)

fp.close()

global MaxSearchPage

if CurrentPage <= MaxSearchPage: #继续下一页爬取

if nextSource(content):

CurrentPage += 1

# 爬取完毕后通过下一页地址继续爬取

spidler("http://image.baidu.com" + nextSource(content))

#爬虫的开启方法

def beginSearch(page=1,save=0,savePath=" D:/pictures/"):

# (page:爬取页数,save:是否储存,savePath:默认储存路径)

global MaxSearchPage,NeedSave,DefaultPath

MaxSearchPage = page

NeedSave = save #是否保存，值0不保存，1保存

DefaultPath = savePath #图片保存的位置

key = input("Please input you want search：")

StartSource = "http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=" + str(key) + "&ct=201326592&v=flip" # 分析链接可以得到,替换其`word`值后面的数据来搜索关键词

spidler(StartSource)

#调用开启的方法就可以通过关键词搜索图片了

beginSearch(page=5,save=1) # page=5是下载前5页，save=1保存图片

运行报错

解决办法链接：

https://mp.csdn.net/postedit/100067179

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
05_运行百度爬取图片

测试代码： import requests #首先导入库 import re #设置默认配置 MaxSearchPage = 20 # 收索页数 CurrentPage = 0 # 当前正在搜索的页数 DefaultPath = "pictures" # 默认储存位置 NeedSave = 0 # 是否需要储存 #图片链接正...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。