豆瓣电影爬虫--简单的小爬虫案例

最新推荐文章于 2022-05-27 17:22:34 发布

丙丁火

最新推荐文章于 2022-05-27 17:22:34 发布

阅读量301

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/caicaibird0531/article/details/96149501

版权

爬虫专栏收录该内容

11 篇文章 0 订阅

订阅专栏

# encoding: utf-8

import requests
from lxml import etree
import json

url = 'https://movie.douban.com/cinema/nowplaying/guangzhou/'

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 '
                 'Safari/537.36',
    'Referer':'https://movie.douban.com/'
}

response = requests.get(url, headers=headers)
text = response.text
html = etree.HTML(text)
ul = html.xpath("//ul[@class='lists']")[0]
lis = ul.xpath("./li")
movies = []
for li in lis:
    title = li.xpath("@data-title")[0]
    score = li.xpath("@data-score")[0]
    duration = li.xpath("@data-duration")[0]
    region = li.xpath("@data-region")[0]
    actors = li.xpath("@data-actors")[0]
    poster = li.xpath(".//img/@src")[0]
    movie = {
        "title":title,
        "score":score,
        "duration":duration,
        "region":region,
        "actors":actors,
        "poster":poster
    }
    movies.append(movie)
with open('doubanfilm.json','w',encoding='utf-8') as fp:
    json.dump(movies,fp,ensure_ascii=False)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

丙丁火

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
豆瓣电影爬虫--简单的小爬虫案例

# encoding: utf-8import requestsfrom lxml import etreeimport jsonurl = 'https://movie.douban.com/cinema/nowplaying/guangzhou/'headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64)...
复制链接

扫一扫