第一关
链接:http://www.heibanke.com/lesson/crawler_ex00/
意思根据网页给的数字不断更新网址后的数字,直到提示进入下一关为止。大概思路用python正则表达式获取网页里数字,再把这个数字加在网址后面一直重复即可
手动输了几个数以后发现
除了第一次为输入数字XXXXX之后都为输入的数字是XXXXX.
即可用下面这个正则表达式来提取网页中的数字
r'数字[^\d]*(\d+)[\.<]
则利用re模块和requests模块来编写脚本
#coding=utf-8
import requests
import re
url = 'http://www.heibanke.com/lesson/crawler_ex00/'
r = requests.get(url).content
number = re.findall(r'数字[^\d]*(\d+)[\.<]',r)
index = 1
while number:
website = url + number[0]
r = requests.get(website).content
number = re.findall(r'数字[^\d]*(\d+)[\.<]',r)
print website
index += 1
else:
print "End"
第二关
大体思路就是用requests模块post用户和密码 因为密码是30以内的数 可以写个for循环给密码赋值
#coding=utf-8
import requests
import re
url = 'http://www.heibanke.com/lesson/crawler_ex01/'
wrongNotify = '您输入的密码错误, 请重新输入'
for index in range(1,31):
while True:
data = {'username': 'aha', 'password': index}
html = requests.post(url, data).content
if wrongNotify in html:
print "第%s次访问,密码%s错误" % (index, index)
break
print "第%s次访问,密码是: %s" % (index,index)
index += 1
也可以一直让数字循环下去直接正确答案出现
#coding=utf-8
import requests
wrongNotify = '您输入的密码错误, 请重新输入'
website = 'http://www.heibanke.com/lesson/crawler_ex01/'
index = 1
while True:
data = {'username': 'Thare', 'password': index}
html = requests.post(website, data).content
if wrongNotify not in html:
print "\n密码是: %d" % index
break
print "第%d次访问,密码%d错误" % (index, index)
index += 1