1.获取你要爬虫的数据代理:user-Agent
2.然后对request头进行封装:
python
def DouBanSpide(i):
url = "https://movie.douban.com/top250?start="+str(i*9)
user_agent = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "}
req = request.Request(url=url, headers=user_agent)
html = request.urlopen(req)
Douban_data_wash(html.read().decode())
3.中间的豆瓣数据利用split方式进行切片,切出你想要的排名,电影名称,评分,及其人数,以及推荐理由。
python
rank = text1.split('<em class=\"\">')[i+1].split("</em>")[0]
title = text1.split('</span>')[i].split(