使用爬虫获取ajax数据

使用爬虫获取豆瓣电影排名信息

分析

使用模块

  • urllib.request
  • json

代码

-简单修改了一下url 可以获取到前100条数据

from urllib import request
import json


class DouBanMovieSpide:
    """
    豆瓣电影剧情片排行榜
    """
    def __init__(self):
        self.url = "https://movie.douban.com/j/chart/top_list?type=11&interval_id=100%3A90&action=&start=0&limit=100"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
        }
    

    def load_page(self):
        """
        加载页面,获取json数据
        """
        try:
            req = request.Request(self.url, headers=self.headers)
            response = request.urlopen(req)
            html = response.read().decode()

            # print(type(html))     # > <class 'str'>

            self.parse_page(html)
        except Exception as e:
            print("load_page error:{}".format(e))


    def parse_page(self, html):
        """
        解析html页面,实际上就是提取json数据
        """
        try:
            text = json.loads(html)

            movie_list = []

            for t in text:
                rating = t['rating'][0]
                rank = t['rank']
                title = t['title']

                movie_info = {
                    "rating": rating,
                    "rank": rank,
                    "title": title,
                }

                movie_list.append(movie_info)

            self.write_info(movie_list)
        except Exception as e:
            print("parse_page error:{}".format(e))


    def write_info(self, movie):
        """
        将提取出来的json数据存储到json文件中
        """
        with open("../text/doubanmovie.json", 'w', encoding="utf-8") as f:
            f.write(json.dumps(movie, ensure_ascii=False))
        print("write success")


if __name__ == "__main__":
    dbm = DouBanMovieSpide()
    dbm.load_page()

  • 获取到的数据
[{"rating": "9.6", "rank": 1, "title": "肖申克的救赎"}, {"rating": "9.6", "rank": 2, "title": "霸王别姬"}, {"rating": "9.6", "rank": 3, "title": "控方证人"}, {"rating": "9.5", "rank": 4, "title": "美丽人生"}, {"rating": "9.5", "rank": 5, "title": "辛德勒的名单"}, {"rating": "9.4", "rank": 6, "title": "这个杀手不太冷"}, {"rating": "9.4", "rank": 7, "title": "阿甘正传"}, {"rating": "9.4", "rank": 8, "title": "十二怒汉"}, {"rating": "9.4", "rank": 9, "title": "泰坦尼克号 3D版"}, {"rating": "9.4", "rank": 10, "title": "背靠背,脸对脸"}, {"rating": "9.4", "rank": 11, "title": "灿烂人生"}, {"rating": "9.4", "rank": 12, "title": "茶馆"}, {"rating": "9.4", "rank": 13, "title": "十二怒汉"}, {"rating": "9.4", "rank": 14, "title": "控方证人"}, {"rating": "9.3", "rank": 15, "title": "盗梦空间"}, {"rating": "9.3", "rank": 16, "title": "泰坦尼克号"}, {"rating": "9.3", "rank": 17, "title": "千与千寻"}, {"rating": "9.3", "rank": 18, "title": "忠犬八公的故事"}, {"rating": "9.3", "rank": 19, "title": "放牛班的春天"}, {"rating": "9.3", "rank": 20, "title": "熔炉"}, {"rating": "9.3", "rank": 21, "title": "城市之光"}, {"rating": "9.3", "rank": 22, "title": "巴黎圣母院"}, {"rating": "9.2", "rank": 23, "title": "三傻大闹宝莱坞"}, {"rating": "9.2", "rank": 24, "title": "海上钢琴师"}, {"rating": "9.2", "rank": 25, "title": "星际穿越"}, {"rating": "9.2", "rank": 26, "title": "楚门的世界"}, {"rating": "9.2", "rank": 27, "title": "触不可及"}, {"rating": "9.2", "rank": 28, "title": "教父"}, {"rating": "9.2", "rank": 29, "title": "活着"}, {"rating": "9.2", "rank": 30, "title": "天堂电影院"}, {"rating": "9.2", "rank": 31, "title": "乱世佳人"}, {"rating": "9.2", "rank": 32, "title": "鬼子来了"}, {"rating": "9.2", "rank": 33, "title": "辩护人"}, {"rating": "9.2", "rank": 34, "title": "素媛"}, {"rating": "9.2", "rank": 35, "title": "小鞋子"}, {"rating": "9.2", "rank": 36, "title": "摩登时代"}, {"rating": "9.2", "rank": 37, "title": "七武士"}, {"rating": "9.2", "rank": 38, "title": "东京物语"}, {"rating": "9.2", "rank": 39, "title": "生活多美好"}, {"rating": "9.2", "rank": 40, "title": "超感猎杀:完结特别篇"}, {"rating": "9.2", "rank": 41, "title": "洞"}, {"rating": "9.2", "rank": 42, "title": "切腹"}, {"rating": "9.2", "rank": 43, "title": "哀乐中年"}, {"rating": "9.2", "rank": 44, "title": "狐妖小红娘剧场版:王权富贵"}, {"rating": "9.1", "rank": 45, "title": "摔跤吧!爸爸"}, {"rating": "9.1", "rank": 46, "title": "无间道"}, {"rating": "9.1", "rank": 47, "title": "蝙蝠侠:黑暗骑士"}, {"rating": "9.1", "rank": 48, "title": "指环王3:王者无敌"}, {"rating": "9.1", "rank": 49, "title": "飞越疯人院"}, {"rating": "9.1", "rank": 50, "title": "两杆大烟枪"}, {"rating": "9.1", "rank": 51, "title": "窃听风暴"}, {"rating": "9.1", "rank": 52, "title": "末代皇帝"}, {"rating": "9.1", "rank": 53, "title": "饮食男女"}, {"rating": "9.1", "rank": 54, "title": "钢琴家"}, {"rating": "9.1", "rank": 55, "title": "教父2"}, {"rating": "9.1", "rank": 56, "title": "美国往事"}, {"rating": "9.1", "rank": 57, "title": "狩猎"}, {"rating": "9.1", "rank": 58, "title": "无人知晓"}, {"rating": "9.1", "rank": 59, "title": "完美的世界"}, {"rating": "9.1", "rank": 60, "title": "忠犬八公物语"}, {"rating": "9.1", "rank": 61, "title": "海蒂和爷爷"}, {"rating": "9.1", "rank": 62, "title": "爱·回家"}, {"rating": "9.1", "rank": 63, "title": "芙蓉镇"}, {"rating": "9.1", "rank": 64, "title": "攻壳机动队2:无罪"}, {"rating": "9.1", "rank": 65, "title": "沉静如海"}, {"rating": "9.1", "rank": 66, "title": "地下"}, {"rating": "9.1", "rank": 67, "title": "熊的故事"}, {"rating": "9.1", "rank": 68, "title": "南海十三郎"}, {"rating": "9.1", "rank": 69, "title": "寻子遇仙记"}, {"rating": "9.1", "rank": 70, "title": "生之欲"}, {"rating": "9.1", "rank": 71, "title": "天堂回信"}, {"rating": "9.1", "rank": 72, "title": "鳄鱼波鞋走天涯"}, {"rating": "9.1", "rank": 73, "title": "剃头匠"}, {"rating": "9.1", "rank": 74, "title": "女人步上楼梯时"}, {"rating": "9.1", "rank": 75, "title": "丛林赤子心"}, {"rating": "9.1", "rank": 76, "title": "情迷意乱"}, {"rating": "9.1", "rank": 77, "title": "无言的山丘"}, {"rating": "9.1", "rank": 78, "title": "战争与和平"}, {"rating": "9.0", "rank": 79, "title": "我不是药神"}, {"rating": "9.0", "rank": 80, "title": "怦然心动"}, {"rating": "9.0", "rank": 81, "title": "少年派的奇幻漂流"}, {"rating": "9.0", "rank": 82, "title": "当幸福来敲门"}, {"rating": "9.0", "rank": 83, "title": "罗马假日"}, {"rating": "9.0", "rank": 84, "title": "搏击俱乐部"}, {"rating": "9.0", "rank": 85, "title": "闻香识女人"}, {"rating": "9.0", "rank": 86, "title": "指环王1:魔戒再现"}, {"rating": "9.0", "rank": 87, "title": "狮子王"}, {"rating": "9.0", "rank": 88, "title": "死亡诗社"}, {"rating": "9.0", "rank": 89, "title": "指环王2:双塔奇兵"}, {"rating": "9.0", "rank": 90, "title": "音乐之声"}, {"rating": "9.0", "rank": 91, "title": "穿条纹睡衣的男孩"}, {"rating": "9.0", "rank": 92, "title": "小森林 夏秋篇"}, {"rating": "9.0", "rank": 93, "title": "一一"}, {"rating": "9.0", "rank": 94, "title": "小森林 冬春篇"}, {"rating": "9.0", "rank": 95, "title": "我爱你"}, {"rating": "9.0", "rank": 96, "title": "大独裁者"}, {"rating": "9.0", "rank": 97, "title": "红鳉鱼"}, {"rating": "9.0", "rank": 98, "title": "莫娣"}, {"rating": "9.0", "rank": 99, "title": "从海底出击"}, {"rating": "9.0", "rank": 100, "title": "大路"}]

总结

  • 使用爬虫获取ajax数据,其实本质就是获取ajax返回的json数据(不全是)。
  • 使用爬虫获取数据应该要重视数据来源,有时不用太在意页面内容。
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值