立即学习:https://edu.csdn.net/course/play/24756/280660?utm_source=blogtoedu
#HTTPS://www.biedoul.com from urllib import request for i in range(31095,31099): i=str(i) z="laugh" + i + ".html" url = "https://www.biedoul.com/index/"+i resp = request.urlopen(url) print(resp.read().decode('utf-8')) request.urlretrieve(url,z) # 爬取别逗了笑话的最后三个网页。
#思考,如果要爬range(1,31099)如何用 yield 代替 range
#-----------------------------------------------
#爬取www.biedoul.com from urllib import request import time #引用时间函数 import random #引用随机数#思路就是爬完一个html随机休息一段时间再爬以免对服务器造成负担,但是自己爬的特慢 def foo(num): while num<31099: num = num+1 yield num for i in foo(1): i=str(i) z="laugh" + i + ".html" url = "https://www.biedoul.com/index/"+i resp = request.urlopen(url) print(resp.read().decode('utf-8')) request.urlretrieve(url,z) time.sleep(random.random() * 3) # 爬取别逗了笑话