Python爬虫闯关-1

最新推荐文章于 2021-11-06 16:14:26 发布

linfeng886

最新推荐文章于 2021-11-06 16:14:26 发布

阅读量292

点赞数 1

分类专栏： python笔记文章标签：爬虫 Python

本文链接：https://blog.csdn.net/linfeng886/article/details/81988189

版权

python笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

第一关

第一关网址:
http://www.heibanke.com/lesson/crawler_ex00/

第一关很简单，就是把当前页面的数字加到网址后面，然后再新页面再循环操作。

步骤

先访问初始网址，也就是http://www.heibanke.com/lesson/crawler_ex00/
然后用re(正则表达式)或者BeatuifulSoup或者xpath取得当前网页中的数字，我用的是re
然后将数字加入到url
如此循环，大概几十次后，会出来一个结束界面，告诉你成功了

代码

import requests
import re
import time
def add_number_to_url(num):
	url='http://www.heibanke.com/lesson/crawler_ex00/'+str(num)
	#用requests库中的get请求
	response = requests.get(url)
	#得到html页面
	html = response.text
	#写一个正则表达式
	#正则表达式教程可以去网上搜搜，这里就不多说了
	patter = re.compile('<h3>.*?(\d+).*?</h3>', re.S)
	nums = re.findall(patter, html)
	#正则表达式匹配返回到结果是个列表，如果列表为空，就说明已经到了最后闯关成功界面了
	if len(nums)>0:
		#取出数字
		print(nums[0])
		#设置一下延时，不设置也没关系
		time.sleep(0.01)
		#函数进行递归，把数字传入，继续访问新链接
		add_number_to_url(nums[0])
	else:
		#当列表为空时，闯关成功，打印闯关成功的界面
		print('ok')
		print(html)
if __name__ == "__main__":
	num = ''
	add_number_to_url(num)

结语：

源码: 点我

欢迎关注我的公众号 疯子的Python笔记

公众号二维码.jpg

linfeng886

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫闯关-1

第一关第一关网址 http://www.heibanke.com/lesson/crawler_ex00/第一关很简单，就是把当前页面的数字加到网址后面，然后再新页面再循环操作。步骤先访问初始网址，也就是http://www.heibanke.com/lesson/crawler_ex00/然后用re(正则表达式)或者BeatuifulSoup或者xpath取...
复制链接

扫一扫

专栏目录