Python爬虫入门
需要导入的包
re :正则表达式
requests:网页请求需要的包
urllib:网页请求需要的包
下载文件的两种方法
1:urllib.request.urlretrieve()
例子:
urllib.request.urlretrieve("http://www.zzti.edu.cn/lqj7vq.jpg","aaa.jpg")
2
fp=open(savePath,'wb')
fp.write(picture.content)
fp.close()
简单例子
defaultPath="F:/python3.5.2/img/2/"
def getImageUrl(content):
return re.findall('"objURL":"(.*?)"',content,re.S)
def spider(content):
htmlcontent=requests.get(content).text
imageList=getImageUrl(htmlcontent)
for imageUrl in imageList:
print("开始下载:"+imageUrl)
picture=requests.get(imageUrl)
imageUrl=imageUrl.replace("/","").replace("?","").replace(":","")
savePath=defaultPath+imageUrl
fp=open(savePath,'wb')
fp.write(picture.content)
fp.close()
if __name__ == "__main__":
search=input("输入下载的图片主题:")
url="https://image.baidu.com/search/index?tn=baiduimage&ie=utf-8&word="+str(search)+"&ct=201326592&v=flip"
spider(url)