python抓取彼岸图网图片

爬取彼岸图网图片和图片名字,非4K,将爬取的图片放在文件夹中;

import requests
import re
from bs4 import BeautifulSoup

def get_Html(url):
	html = requests.get(url)
	html.encoding = 'gbk'
	return html.text

def get_Img(url):
	html = get_Html(url)
	# 得到图片
	imglist = re.findall('img src="(.*?)</b></a>',html)
	print(len(imglist))
	i=0
	for imgurl in imglist:
		#得到链接
		imgurl = imglist[i].split("\" alt")
		#得到img名字
		imgname = imglist[i].split("<b>")
		print(imgname[1])
		imgurls = "https://pic.netbian.com"+imgurl[0]
		img = requests.get(imgurls)
		dir = "D:\\cc\\111\\"+"_"+imgname[1].replace("*"," ").replace("?"," ").replace(":"," ").replace("\""," ").replace("<"," ").replace(">"," ").replace("\\"," ").replace("/"," ").replace("|"," ")+".jpg"
		with open(dir, "wb+") as f:	
			f.write(img.content)
		i+=1

def img_page(url):
	html = get_Html(url)
	imgpagel =  re.findall('</span><a (.*?)下一页',html)
	imgpage =  re.findall('">(.*)</a>',imgpagel[0])
	print("共"+imgpage[0]+"页")
	print("-------------------")
	return imgpage[0]


#循环遍历页码
for j in range(1,2):
	if j == 1:
		url = "https://pic.netbian.com/4kdongman/"
		# url = "https://pic.netbian.com/4kmeinv/"
		# url = "https://pic.netbian.com/4kfengjing/"
		imgpage = img_page(url)
	else:
		url = "https://pic.netbian.com/4kdongman/index_"+str(j)+".html"
		# url = "https://pic.netbian.com/4kmeinv/index_"+str(j)+".html"
		# url = "https://pic.netbian.com/4kfengjing/index_"+str(j)+".html"

	if int(j) <= int(imgpage):
		print("第"+str(j)+"页")
		get_Img(url)
	else:
		print("下载完了")
		break


爬取多少页,可以自行修改range,以上程序是爬取第一页图片

ps:仅供学习可用

  • 9
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值