python爬虫豆瓣高分电影前一百部

思路:

  • 打开豆瓣相关页,进行抓包
  • 抓取json的url,进行爬取
  • headers,url,get或post
  • 返回json后进行处理
处理json
  1. 显示采用jsonpath得到电影名
  2. 因为博主初学,还不会得到电影名与评分,故采取了两次jsonpath.jsonpath
  3. 得到之后的list进行交叉合并,此处使用的是chain
  4. 在合并后的list采取算法使其进行换行以及隔开
  5. 最后保存在本地
import requests
import json
import jsonpath
from itertools import chain


url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=100&page_start=0"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36"
}


r = requests.get(url=url,headers=headers)
# print(r.content.decode())
ret = json.dumps(r.content.decode(),ensure_ascii=False,indent=4)
# print(type(r.content.decode()))
with open("douban.json","w",encoding="utf-8") as f:
    f.write(ret)

#
# with open("douban.json","r",encoding="utf-8") as f:
#     ret4 = json.load(f)
#     print(ret4)
#     print(type(ret4))

# print(r.json())
# res = r.json()['subjects'][0]['title']
# print(r.json()['subjects'][0]['title'])
# print(type(res))
# print(r.json())
# print(type(r.json()))
name = jsonpath.jsonpath(r.json(),'$..title')
rate = jsonpath.jsonpath(r.json(),'$..rate')
# print(name)
# print(rate)
# print(type(name))
want = list(chain.from_iterable(zip(name,rate)))
# print(want)

count1 = 0
for w in want:

    if count1%3 == 1:
        want.insert(count1,":")
    # elif count%2 == 0:
    #     want.insert(count,"\n")
        # print()
    count1 += 1

count2 = 0
for w in want:

    if count2%4 == 0:
        want.insert(count2,"\n")
    # elif count%2 == 0:
    #     want.insert(count,"\n")
        # print()
    count2 += 1

print(want)
str1 = " ".join(want)
print(str1)
with open("want.txt","w",encoding="utf-8") as f:
    f.write(str1)
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值