实验楼:楼+数据分析与挖掘 挑战三【使用 GitHub API 采集数据】

挑战内容

GitHub 上的每一个仓库默认都会有 Issues 页面,Issues 相当于仓库的问题追踪系统,开发者的功能需要,用户找到的 BUG 都可以提交为 Issues。例如,著名数据分析库 Pandas 其托管在 GitHub 上的地址为:

其中 Issues API 返回 JSON 格式的数据,它最多能返回 Issues 页面的最近 30 条数据。单条数据示例如下:

{
"url": "https://api.github.com/repos/pandas-dev/pandas/issues/22658",
"repository_url": "https://api.github.com/repos/pandas-dev/pandas",
"labels_url": "https://api.github.com/repos/pandas-dev/pandas/issues/22658/labels{/name}",
"comments_url": "https://api.github.com/repos/pandas-dev/pandas/issues/22658/comments",
"events_url": "https://api.github.com/repos/pandas-dev/pandas/issues/22658/events",
"html_url": "https://github.com/pandas-dev/pandas/pull/22658",
"id": 358602608,
"node_id": "MDExOlB1bGxSZXF1ZXN0MjE0Mjk4MzQ0",
"number": 22658,
"title": "DOC iteritems docstring update and examples",
"user": {
    "login": "Ecboxer",
    "id": 20912214,
    "node_id": "MDQ6VXNlcjIwOTEyMjE0",
    "avatar_url": "https://avatars3.githubusercontent.com/u/20912214?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/Ecboxer",
    "html_url": "https://github.com/Ecboxer",
    "followers_url": "https://api.github.com/users/Ecboxer/followers",
    "following_url": "https://api.github.com/users/Ecboxer/following{/other_user}",
    "gists_url": "https://api.github.com/users/Ecboxer/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/Ecboxer/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/Ecboxer/subscriptions",
    "organizations_url": "https://api.github.com/users/Ecboxer/orgs",
    "repos_url": "https://api.github.com/users/Ecboxer/repos",
    "events_url": "https://api.github.com/users/Ecboxer/events{/privacy}",
    "received_events_url": "https://api.github.com/users/Ecboxer/received_events",
    "type": "User",
    "site_admin": false
    },
"labels": [],
"state": "open",
"locked": false,
"assignee": null,
"assignees": [],
"milestone": null,
"comments": 2,
"created_at": "2018-09-10T12:34:25Z",
"updated_at": "2018-09-10T17:10:02Z",
"closed_at": null,
"author_association": "NONE",
"pull_request": {
    "url": "https://api.github.com/repos/pandas-dev/pandas/pulls/22658",
    "html_url": "https://github.com/pandas-dev/pandas/pull/22658",
    "diff_url": "https://github.com/pandas-dev/pandas/pull/22658.diff",
    "patch_url": "https://github.com/pandas-dev/pandas/pull/22658.patch"
    },
"body": "Updated iteritems docstring to start with an infinitive and added a short example\r\n\r\n- [ ] closes #xxxx\r\n- [ ] tests added / passed\r\n- [ ] passes `git diff upstream/master -u -- \"*.py\" | flake8 --diff`\r\n- [ ] whatsnew entry\r\n"
},

 

本次挑战中,你需要在 ~/Code/github_data.py 文件中编写一个函数 issuesissues 函数接受 1 个参数 repo 用于指定传入的仓库名称(例如 Pandas 仓库的名称为:pandas-dev/pandas)。

你需要补充 issues 函数,使其能够获取到指定名称仓库最近的 issues 条目,条目数量以 Issues API 地址返回为准,不一定是 30 条。然后,将 JSON 处理成 DataFrame 后作为 issues 函数返回值 issues_df。规定 DataFrame 的样式为(示例前 3 条):

 numbertitleuser_name
022658DOC iteritems docstring update and examplesEcboxer
122657DOC: Follows ISO 639-1 codeKangYoosam
222655BUG: Column Offset with to_html(index=False) w...simonjayhawkins

其中:

  • number: Issues 序号,对应示例 JSON 数据中的 number 字段。
  • title: Issues 名称,对应示例 JSON 数据中的 title 字段。
  • user_name: 提交该 Issues 的用户名,对应示例 JSON 数据中的 user.login 字段。

挑战要求

  • 代码必须写入 ~/Code/github_data.py 文件中。
  • 函数名必须是 issues,并返回 issues_df
  • 测试时请使用 /home/shiyanlou/anaconda3/bin/python 运行 github_data.py,避免出现无相应模块的情况。

挑战代码答案

import requests
import pandas as pd

def issues(repo):
    url = "https://api.github.com/repos/{}/issues".format(repo)
    issues = requests.get(url)
    issues_list = []
    for issue in issues.json():
        issues_dict = {'number':issue['number'],
                    'title':issue['title'],
                    'user_name':issue['user']['login']}
        issues_list.append(issues_dict)
    
    issues_df = pd.DataFrame(issues_list)

    return issues_df

issues("numpy/numpy")

 

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值