环境准备
先看下要下载网页
分析
网页是动态加载的,每次滚动鼠标指最后一部分时,会动态加载新的数据
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=20&limit=20
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=40&limit=20
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=60&limit=20
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=80&limit=20
https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=100&limit=20
他的respon数据经分析是这样的
写代码开始爬取
import requests
url = 'https://movie.douban.com/j/chart/top_list'
headers = {
'User-Agent': '***************************',
}
for page in range(0, 100, +20):
params = {
'type': '5',
'interval_id': '100:90',
'action': '',
'start': str(page),
'limit': '20',
}
response = requests.get(url=url, headers=headers, params=params)
page_text = response.json()
for movie in page_text:
name = movie['title']
score = movie['score']
rank = movie['rank']
print(rank, score, name)
获得结果看下:
一共100条数据,完美。
完成