Reddit网站获赞最高文章/评论的爬取

Reddit

前面我们熟悉了API,学会了如何发出请求,授权以及解析API响应。现在,我们将这些概念串在一起探索一下Reddit网站上的热门文章和评论。
Reddit是一个社区驱动的额分享网站,用户可以提交文章和链接,其他人可以进行upvote(表示喜欢),或者downvote(表示不喜欢)。用户也可以对提交内容进行评价,评价也可以被upvoted 以及downvoted。Reddit有很多子社区,子社区中的将会集中讨论出一个好文章,比如r/python社区。
在这篇文章中我们需要学会如何以下几点:

  • Getting a list of trending articles in a subreddit.
  • Exploring the comments of a single article.
  • Posting our own comment on the article.

Authenticating With The API

Reddit API 是需要授权的,在Github API授权是使用了一个Token口令,在这里我们使用OAuth,OAuth很复杂,这里给出了一个授权口令:13426216-4U1ckno9J5AiK72VRbpEeBaMSKk
header = {“Authorization”: “bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk”}与Github中格式有些不同,使用bearer而不是token。
并且还需要在header中添加User-Agent,告诉Reddit是Dataquest在访问API:
header = {“Authorization”: “bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk”, “User-Agent”: “Dataquest/1.0”}

  • 检索/r/python subreddit上昨天的top articles
headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}
params = {"t": "day"}
response = requests.get("https://oauth.reddit.com/r/python/top", headers=headers, params=params)
python_top = response.json()
'''
{'data': {'approved_by': None,
     'archived': False,
     'author': 'ingvij',
     ...
     'ups': 43,
     'url': 'http://hkupty.github.io/2016/Functional-Programming-Concepts-Idioms-and-Philosophy/',
     'user_reports': [],
     'visited': False},
     'kind': 't3'}
...
'''

Getting The Most Upvoted Article

  • 由于返回JOSN数据是个字典,包含文章信息的键都隐藏在children键中。
{'data': {'after': None,
  'before': None,
  'children': [{'data': {'approved_by': None,
     'archived': False,
     ...
     'title': 'Functional Philosophy and applying it to Python',
     'ups': 53,
     'url': 'http://hkupty.github.io/2016/Functional-Programming-Concepts-Idioms-and-Philosophy/',
  • 因此首先要将data的值提取出来,再将data中的children的值提取出来,是个列表,包含多个字典,每个字典是一篇文章的信息。
# 找到up最多的文章
python_top_articles = python_top["data"]["children"]
most_upvoted = ""
most_upvotes = 0
most_upvote_name = ""
for article in python_top_articles:
    ar = article["data"]
    if ar["ups"] >= most_upvotes:
        most_upvoted = ar["id"]
        most_upvotes = ar["ups"]
        most_upvote_name = ar["title"]
print(most_upvote_name)        
'''
Functional Philosophy and applying it to Python
'''        

Getting Article Comments

现在已经知道了投票数最高的文章的ID,我们可以继续探索它的评论信息,使用/r/{subreddit}/comments/{articleID},我们要在前面加上https://oauth.reddit.com/来获取完整的请求URL:

headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}
response = requests.get("https://oauth.reddit.com/r/python/comments/4b7w9u", headers=headers)

comments = response.json()

Getting The Most Upvoted Comment

  • 评论信息的结构如下:
{'data': {'approved_by': None,
      'archived': False,
      'author': 'larsga',
      ...
      'replies': {'data': {'after': None,
        'before': None,
        'children': [{'data': {'approved_by': None,
           'archived': False,
           'author': 'Deto',
           ...
           },
          ...
          ]
          }
          ...
          'url': 'https://www.reddit.com/r/Python/comments/4b6bew/using_pilpillow_with_mozjpeg/',
         'user_reports': [],
         'visited': False
         }
  • 找到评论里up最高的评论:
comments_list = comments[1]["data"]["children"]
most_upvoted_comment = ""
most_upvotes_comment = 0
for comment in comments_list:
    co = comment["data"]
    if co["ups"] >= most_upvotes_comment:
        most_upvoted_comment = co["id"]
        most_upvotes_comment = co["ups"]

Upvoting A Comment

我们也可以进行 /api/vote端点进行投票,投票是传送请求,我们使用POST请求:

  • dir – vote direction, 1, 0, or -1. 1 is an upvote, and -1 is a downvote.
  • id – the id of the article or comment to upvote.
payload = {"dir": 1, "id": "d16y4ry"}
headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}
response = requests.post("https://oauth.reddit.com/api/vote", json=payload, headers=headers)
status = response.status_code
  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值