爬取豆瓣Top250电影。![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/d1567a37f15da7bcf247b1acf9e77513.png)
import requests
from lxml import etree
import csv
headers={
'Cookie':'自己的cookie
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
}
url="https://movie.douban.com/top250?start={}&filter="
mov_list=[]
fp=open(r"C:\Users\Administrator\Documents\Tencent Files\1936705477\FileRecv\test.csv","w",encoding="gbk")
header=["电影名","评价人数"]
writer = csv.DictWriter(fp,fieldnames=header)
writer.writeheader()
for i in range(250):
response=requests.get(url=url.format(i*25),headers=headers)
html=response.text
element=etree.HTML(html)
li=element.xpath('//*[@id="content"]/div/div[1]/ol/li')
for i in li:
data={}
title=i.xpath("./div[1]/div[2]/div[1]/a/span[1]/text()")[0]
pingjia=i.xpath("./div[1]/div[2]/div[2]/div[1]/span[4]/text()")[0]
data["电影名"]=title
data["评价人数"]=pingjia
writer.writerow(data)
fp.close()
print(mov_list)
被识破:
![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/282f1c1b7fbd75d219d24341f06a27a6.png)
就需要用户自己去登录或注册豆瓣账号,在开发者模式中的,找到cookie,然后携带到headers请求头中。
![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/04ac1122205cb830c0241bea771dd2da.png)
headers={
'Cookie':'自己的cookie ,
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
}