python
豆瓣电影影评爬取
需要库
lxml,requests
代码如下:
import requests
from lxml import etree
'''定义头部信息'''
head={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}
base_url='https://movie.douban.com/review/best/?start='
list1=[]
for i in range(5):
url=base_url+'{}'.format(20*i)
r=requests.get(url,headers=head)
domo=r.content.decode('utf8')
jiexi=etree.HTML(domo)
_url=jiexi.xpath('//div[@class="main-bd"]/h2/a/@href')
list1.append(_url)
for _url in list1:
for url1 in _url:
print(url1)
运行结果如下:
https://movie.douban.com/review/13714022/
https://movie.douban.com/review/13715350/
https://movie.douban.com/review