文章目录
爬取豆瓣电影Top250数据,包括电影的电影名、导演、演员等基本信息,以及海报图片、剧情简介和评论数量。
运行截图如下:
1、构建请求头
总共有10页,每页25条电影数据,page_start为每页的起始位置,如第一页为0,第二页为25,因此想要爬取全部页数的数据只用从0遍历到250,以25为步长即可,即range(0, 250, 25)。
请求头可以使用https://curlconverter.com/快速构建,使用方法可访问https://blog.csdn.net/Pangaoyang_/article/details/140873357?spm=1001.2014.3001.5502
cookies = {
'll': '"118282"',
'bid': 'qpeBkdWNQ30',
'__utma': '30149280.1285408772.1722931171.1722931171.1722931171.1',
'__utmc': '30149280',
'__utmz': '30149280.1722931171.1.1.utmcsr=cn.bing.com|utmccn=(referral)|utmcmd=referral|utmcct=/',
'__utmt': '1',
'__utmb': '30149280.1.10.1722931171',
'__utma': '223695111.549597820.1722931184.1722931184.1722931184.1',
'__utmb': '223695111.0.10.1722931184',
'__utmc': '223695111',
'__utmz': '223695111.1722931184.1.1.utmcsr=cn.bing.com|utmccn=(referral)|utmcmd=referral|utmcct=/',
'_pk_ref.100001.4cf6': '%5B%22%22%2C%22%22%2C1722931184%2C%22https%3A%2F%2Fcn.bing.com%2F%22%5D',
'_pk_id.100001.4cf6': '39e7e842a6abee49.1722931184.',
'_pk_ses.100001.4cf6': '1',
'ap_v': '0,6.0',
'__yadk_uid': '5tRoftzrzq0L8EylRtLcRgAgQ8c6kVkb',
}
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
# 'cookie': 'll="118282"; bid=qpeBkdWNQ30; __utma=30149280.1285408772.1722931171.1722931171.1722931171.1; __utmc=30149280; __utmz=30149280.1722931171.1.1.utmcsr=cn.bing.com|utmccn=(referral)|utmcmd=referral|utmcct=/; __utmt=1; __utmb=30149280.1.10.1722931171; __utma=223695111.549597820.1722931184.1722931184.1722