使用API
17.1 使用Web API
Web API 是网站的一部分,用于与使用具体URL请求特定信息的程序交互
17.1.1 Git 和 GitHub
git是一个分布式版本控制系统
github是项目仓库
github上的用户可以给喜欢的项目加星表示支持
17.1.2 使用API调用请求数据
在浏览器输入“
https://api.github.com/search/repositories?q=language:python&sort=stars
"
关于这个ibiaibia用,有以下解释:
- https://api.github.com/ 将请求发送到github网站中响应API调用的部分
- search/repositories 让API搜寻github上所有的仓库
- ?q=language:python&sort=stars,?表示后面要传递一个参数,q表示查询,=是指定查询,language:python指只获取python语言仓库的信息,&表示且,sort=stars表示按星星排序
执行程序后,浏览器会返回一个页面,见下图:
关于返回信息的说明:
- “total_count” 当前总共有的python项目数
- “incomplete_results” 值为false,表示请求成功,值为true,表示请求不完整
- "items"是返回的信息,包括受欢迎python项目的详细信息
17.1.3 安装Requests
按照课本安装应该要出错,推荐以下安装办法
安装办法:
- win + r 回车,在cmd 打开命令行界面
- 输入 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn requests
- 安装成功后,验证以下:输入python进入python界面, 输入import requests,不报错,说明安装成功
17.1.4 处理API响应
17.1.2小节是在网页中直接使用API,本小节我们将使用python程序调用该API,并存储返回数据
新建一个python_repos.py文件
# python_repos.py
import requests
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(response_dict.keys()) # 打印数据中的所有键值
输出:
Status code: 200
dict_keys([‘total_count’, ‘incomplete_results’, ‘items’])
三个键值的内容,可以参考17.1.2中的截图
17.1.5 处理响应字典
我们对返回的数据执行一些操作
# python_repos.py
import requests
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")
# 研究第一个仓库
repo_dict = repo_dicts[0]
print(f"\nKeys:{len(repo_dict)}")
for key in sorted(repo_dict.keys()):
print(key)
输出:
Status code: 200
Total repositories: 8657291
Repositories returned: 30
Keys:78
allow_forking
archive_url
archived
assignees_url
blobs_url
branches_url
clone_url
collaborators_url
comments_url
commits_url
compare_url
…
- 探索仓库信息:
response_dict[‘items’]的值是一个列表,列表里的一个元素表示一个仓库,print(f"Repositories returned: {len(repo_dicts)}")表示获得了多少个仓库的信息- 研究第一个仓库:
repo_dicts[0]表示第一个仓库,其值是一个字典,print(f"\nKeys:{len(repo_dict)}")打印第一个仓库中有多少个字典元素,for语句用来打印该字典所有的键值,笔记中的输出只节选了一部分
sorted()是一个函数,用于返回的一个新的数据结构,而不是在原数据结构上操作,注意与sort()区别,(sort()是方法,在原列表上上操作,不返回任何值)
以第一个仓库为例,里面有很多关键信息,我们把它打印出来,加深印象
# python_repos.py
import requests
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")
# 研究第一个仓库
repo_dict = repo_dicts[0]
print("\nSelected information about first repository:")
print(f"Name: {repo_dict['name']}") #项目名称
print(f"Owner: {repo_dict['owner']['login']}") # 作者名称
print(f"Stars:{repo_dict['stargazers_count']}") # 星星数
print(f"Repository: {repo_dict['html_url']}") # 仓库的网址
print(f"Created: {repo_dict['created_at']}") #创建时间
print(f"Updated:{repo_dict['updated_at']}") # 最后一次更新时间
print(f"Description: {repo_dict['description']}") #仓库的描述
输出:
Status code: 200
Total repositories: 8618098
Repositories returned: 30
Selected information about first repository:
Name: public-apis
Owner: public-apis
Stars:179823
Repository: https://github.com/public-apis/public-apis
Created: 2016-03-20T23:49:42Z
Updated:2022-02-11T08:00:24Z
Description: A collective list of free APIs
17.1.6 概述最受欢迎的库
改写程序,打印多个仓库的关键信息
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")
# 打印多个仓库的信息
print("\nSelected information about each repository:")
for repo_dict in repo_dicts:
print(f"Name: {repo_dict['name']}") #项目名称
print(f"Owner: {repo_dict['owner']['login']}") # 作者名称
print(f"Stars:{repo_dict['stargazers_count']}") # 星星数
print(f"Repository: {repo_dict['html_url']}") # 仓库的网址
print(f"Created: {repo_dict['created_at']}") #创建时间
print(f"Updated:{repo_dict['updated_at']}") # 最后一次更新时间
print(f"Description: {repo_dict['description']}") #仓库的描述
输出:
Status code: 200
Total repositories: 8659049
Repositories returned: 30
Selected information about each repository:
Name: public-apis
Owner: public-apis
Stars:179827
Repository: https://github.com/public-apis/public-apis
Created: 2016-03-20T23:49:42Z
Updated:2022-02-11T08:56:12Z
Description: A collective list of free APIs
Name: system-design-primer
Owner: donnemartin
Stars:161836
Repository: https://github.com/donnemartin/system-design-primer
Created: 2017-02-26T16:15:28Z
Updated:2022-02-11T09:00:28Z
Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Name: Python
Owner: TheAlgorithms
Stars:129133
Repository: https://github.com/TheAlgorithms/Python
Created: 2016-07-16T09:44:01Z
Updated:2022-02-11T08:30:36Z
Description: All Algorithms implemented in Python
Name: Python-100-Days
Owner: jackfrued
Stars:114901
Repository: https://github.com/jackfrued/Python-100-Days
Created: 2018-03-01T16:05:52Z
Updated:2022-02-11T09:00:10Z
Description: Python - 100天从新手到大师
Name: youtube-dl
Owner: ytdl-org
Stars:106084
Repository: https://github.com/ytdl-org/youtube-dl
Created: 2010-10-31T14:35:07Z
Updated:2022-02-11T08:11:11Z
Description: Command-line program to download videos from YouTube.com and other video sites
Name: models
Owner: tensorflow
Stars:72738
Repository: https://github.com/tensorflow/models
Created: 2016-02-05T01:15:20Z
Updated:2022-02-11T08:44:46Z
Description: Models and examples built with TensorFlow
Name: thefuck
Owner: nvbn
Stars:66703
Repository: https://github.com/nvbn/thefuck
Created: 2015-04-08T15:08:04Z
Updated:2022-02-11T07:55:26Z
Description: Magnificent app which corrects your previous console command.
Name: django
Owner: django
Stars:62285
Repository: https://github.com/django/django
Created: 2012-04-28T02:47:18Z
Updated:2022-02-11T08:59:37Z
Description: The Web framework for perfectionists with deadlines.
Name: transformers
Owner: huggingface
Stars:57983
Repository: https://github.com/huggingface/transformers
Created: 2018-10-29T13:56:00Z
Updated:2022-02-11T08:48:25Z
Description: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Name: flask
Owner: pallets
Stars:57901
Repository: https://github.com/pallets/flask
Created: 2010-04-06T11:11:59Z
Updated:2022-02-11T07:30:18Z
Description: The Python micro framework for building web applications.
Name: keras
Owner: keras-team
Stars:53933
Repository: https://github.com/keras-team/keras
Created: 2015-03-28T00:35:42Z
Updated:2022-02-11T05:00:09Z
Description: Deep Learning for humans
Name: awesome-machine-learning
Owner: josephmisiti
Stars:53149
Repository: https://github.com/josephmisiti/awesome-machine-learning
Created: 2014-07-15T19:11:19Z
Updated:2022-02-11T07:00:12Z
Description: A curated list of awesome Machine Learning frameworks, libraries and software.
Name: HelloGitHub
Owner: 521xueweihan
Stars:52221
Repository: https://github.com/521xueweihan/HelloGitHub
Created: 2016-05-04T06:24:11Z
Updated:2022-02-11T08:52:09Z
Description: :octocat: 分享 GitHub 上有趣、入门级的开源项目。Share interesting, entry-level open source projects on GitHub.
Name: ansible
Owner: ansible
Stars:52088
Repository: https://github.com/ansible/ansible
Created: 2012-03-06T14:58:02Z
Updated:2022-02-11T08:36:52Z
Description: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.
Name: scikit-learn
Owner: scikit-learn
Stars:48988
Repository: https://github.com/scikit-learn/scikit-learn
Created: 2010-08-17T09:43:38Z
Updated:2022-02-11T07:39:00Z
Description: scikit-learn: machine learning in Python
Name: requests
Owner: psf
Stars:46871
Repository: https://github.com/psf/requests
Created: 2011-02-13T18:38:17Z
Updated:2022-02-11T06:50:09Z
Description: A simple, yet elegant, HTTP library.
Name: face_recognition
Owner: ageitgey
Stars:43076
Repository: https://github.com/ageitgey/face_recognition
Created: 2017-03-03T21:52:39Z
Updated:2022-02-11T07:35:36Z
Description: The world’s simplest facial recognition api for Python and the command line
Name: scrapy
Owner: scrapy
Stars:42756
Repository: https://github.com/scrapy/scrapy
Created: 2010-02-22T02:01:14Z
Updated:2022-02-11T05:53:26Z
Description: Scrapy, a fast high-level web crawling & scraping framework for Python.
Name: cpython
Owner: python
Stars:42740
Repository: https://github.com/python/cpython
Created: 2017-02-10T19:23:51Z
Updated:2022-02-11T08:35:54Z
Description: The Python programming language
Name: big-list-of-naughty-strings
Owner: minimaxir
Stars:41917
Repository: https://github.com/minimaxir/big-list-of-naughty-strings
Created: 2015-08-08T20:57:20Z
Updated:2022-02-11T07:40:37Z
Description: The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.
Name: fastapi
Owner: tiangolo
Stars:41617
Repository: https://github.com/tiangolo/fastapi
Created: 2018-12-08T08:21:47Z
Updated:2022-02-11T08:09:02Z
Description: FastAPI framework, high performance, easy to learn, fast to code, ready for production
Name: manim
Owner: 3b1b
Stars:41423
Repository: https://github.com/3b1b/manim
Created: 2015-03-22T18:50:58Z
Updated:2022-02-11T08:58:12Z
Description: Animation engine for explanatory math videos
Name: faceswap
Owner: deepfakes
Stars:40308
Repository: https://github.com/deepfakes/faceswap
Created: 2017-12-19T09:44:13Z
Updated:2022-02-11T08:06:20Z
Description: Deepfakes Software For All
Name: localstack
Owner: localstack
Stars:38638
Repository: https://github.com/localstack/localstack
Created: 2016-10-25T23:48:03Z
Updated:2022-02-11T08:00:37Z
Description: 💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!
Name: rich
Owner: Textualize
Stars:34926
Repository: https://github.com/Textualize/rich
Created: 2019-11-10T15:28:09Z
Updated:2022-02-11T08:25:12Z
Description: Rich is a Python library for rich text and beautiful formatting in the terminal.
Name: PayloadsAllTheThings
Owner: swisskyrepo
Stars:34191
Repository: https://github.com/swisskyrepo/PayloadsAllTheThings
Created: 2016-10-18T07:29:07Z
Updated:2022-02-11T08:14:45Z
Description: A list of useful payloads and bypass for Web Application Security and Pentest/CTF
Name: Real-Time-Voice-Cloning
Owner: CorentinJ
Stars:33326
Repository: https://github.com/CorentinJ/Real-Time-Voice-Cloning
Created: 2019-05-26T08:56:15Z
Updated:2022-02-11T08:53:59Z
Description: Clone a voice in 5 seconds to generate arbitrary speech in real-time
Name: CppCoreGuidelines
Owner: isocpp
Stars:33177
Repository: https://github.com/isocpp/CppCoreGuidelines
Created: 2015-08-19T20:22:52Z
Updated:2022-02-11T08:05:07Z
Description: The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Name: shadowsocks
Owner: shadowsocks
Stars:33103
Repository: https://github.com/shadowsocks/shadowsocks
Created: 2012-04-20T13:10:49Z
Updated:2022-02-11T01:45:24Z
Description: None
Name: pandas
Owner: pandas-dev
Stars:32680
Repository: https://github.com/pandas-dev/pandas
Created: 2010-08-24T01:37:33Z
Updated:2022-02-11T08:33:24Z
Description: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
…
17.1.7 监视API的速率限制
在浏览器中输入:
”
https//api.github.com/rate_limit
"
得到以下响应:
{“resources”:{“core”:{“limit”:60,“remaining”:60,“reset”:1644573984,“used”:0,“resource”:“core”},“graphql”:{“limit”:0,“remaining”:0,“reset”:1644573984,“used”:0,“resource”:“graphql”},“integration_manifest”:{“limit”:5000,“remaining”:5000,“reset”:1644573984,“used”:0,“resource”:“integration_manifest”},“search”:{“limit”:10,“remaining”:10,“reset”:1644570444,“used”:0,“resource”:“search”}},“rate”:{“limit”:60,“remaining”:60,“reset”:1644573984,“used”:0,“resource”:“core”}}
17.2 使用Plotly可视化仓库
我们将上一节中的受欢迎python项目进行可视化操作
新建一个python_visual.py
# python_visual.py
import requests
from plotly.graph_objs import Bar
from plotly import offline
# 上一节代码复制过来
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
repo_names, stars = [], [] # 新建两个列表用来存储项目名称和星星数量
for repo_dict in repo_dicts:
repo_names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
# 可视化
data = [{
'type': 'bar',
'x': repo_names,
'y': stars
}]
my_layout = {
'title': 'GitHub上最受欢迎的Python项目',
'xaxis': {'title': 'Repository'},
'yaxis':{'title': 'Stars'},
}
fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='python_repos.html')
执行程序,如下图所示:
执行程序会形成两个结果,不知道为何,第一和第二个项目会莫名其妙丢掉
17.2.1 改进Plotly图标
- 改进定制条形,通过修改data 中的 marker 键值
- 通过修改 my_layout 中的代码,修改标题、坐标轴和轴刻度大小
# python_visual.py
import requests
from plotly.graph_objs import Bar
from plotly import offline
# 上一节代码复制过来
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
repo_names, stars = [], [] # 新建两个列表用来存储项目名称和星星数量
for repo_dict in repo_dicts:
repo_names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
# 可视化
data = [{
'type': 'bar',
'x': repo_names,
'y': stars,
'marker':{
'color': 'rgb(60, 100, 150)',
'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
}, # 条形的颜色和边框宽度颜色
'opacity': 0.6 # 条形的不透明度
}]
my_layout = {
'title': 'GitHub上最受欢迎的Python项目',
'titlefont': {'size': 28},
'xaxis': {
'title': 'Repository',
'titlefont': {'size': 24},
'tickfont':{'size': 14},
},
'yaxis':{
'title': 'Stars',
'titlefont': {'size': 24},
'tickfont': {'size': 14},
},
}
fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='python_repos.html')
注意:因为数据是即时从api上获取,每次都不一样,这就告诉我们,一般要把数据下载到本地电脑,然后从本地装载进程序,这样才可以保证数据的稳定性
执行程序,见下图:
17.2.2添加自定义工具提示
将鼠标指向图形图将显示其表示信息,这就叫工具提示
我们将修改上节代码,用来添加项目的描述和所有者等信息
# python_visual.py
import requests
from plotly.graph_objs import Bar
from plotly import offline
# 上一节代码复制过来
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
repo_names, stars, labels = [], [], [] # 新建三个列表用来存储项目名称、星星数量和显示标签
for repo_dict in repo_dicts:
repo_names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
owner = repo_dict['owner']['login']
description = repo_dict['description']
label = f"{owner}<br />{description}" # <br /> HTMl语法中的换行
labels.append(label)
# 可视化
data = [{
'type': 'bar',
'x': repo_names,
'y': stars,
'hovertext': labels, # 列表中的提醒文本标签
'marker':{
'color': 'rgb(60, 100, 150)',
'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
}, # 条形的颜色和边框宽度颜色
'opacity': 0.6 # 条形的不透明度
}]
my_layout = {
'title': 'GitHub上最受欢迎的Python项目',
'titlefont': {'size': 28},
'xaxis': {
'title': 'Repository',
'titlefont': {'size': 24},
'tickfont':{'size': 14},
},
'yaxis':{
'title': 'Stars',
'titlefont': {'size': 24},
'tickfont': {'size': 14},
},
}
fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='python_repos.html')
执行程序前,显示图片如下:
执行程序后,显示图片如下:
注意:
每次程序执行,总会丢掉某一个或某几个项目
17.2.3 在图表中添加可单击的链接
Plotly 允许在文本元素中使用HTML,因此我们可以将x轴标签作为链接,用来访问该项目的GitHub主页
# python_visual.py
import requests
from plotly.graph_objs import Bar
from plotly import offline
# 上一节代码复制过来
# 执行API调用并存储响应
url = r"https://api.github.com/search/repositories?q=language:python&sort=stars"
headers = {'Accept': 'application/vnd.github.v3+json'} # 显式指定github API版本
r = requests.get(url, headers=headers) # 调用get(),将响应数据赋给r
print(f"Status code: {r.status_code}") # 响应数据包含一个status_code属性,表示请求是否成功,200表示成功
response_dict = r.json() # 将返回的json数据存储在变量response_dict中
print(f"Total repositories: {response_dict['total_count']}") # 打印仓库有多少个
# 探索仓库信息
repo_dicts = response_dict['items']
repo_links, stars, labels = [], [], [] # 新建三个列表用来存储项目名称及链接、星星数量和显示标签
for repo_dict in repo_dicts:
repo_name = repo_dict['name']
repo_url = repo_dict['html_url']
repo_link = f"<a href='{repo_url}'>{repo_name}</a>"
# 创建一个指向目标的链接,格式为:<a href='URL'><text/a>
repo_links.append(repo_link)
stars.append(repo_dict['stargazers_count'])
owner = repo_dict['owner']['login']
description = repo_dict['description']
label = f"{owner}<br />{description}" # <br /> HTMl语法中的换行
labels.append(label)
# 可视化
data = [{
'type': 'bar',
'x': repo_links,
'y': stars,
'hovertext': labels, # 列表中的提醒文本标签
'marker':{
'color': 'rgb(60, 100, 150)',
'line': {'width': 1.5, 'color': 'rgb(25, 25, 25)'}
}, # 条形的颜色和边框宽度颜色
'opacity': 0.6 # 条形的不透明度
}]
my_layout = {
'title': 'GitHub上最受欢迎的Python项目',
'titlefont': {'size': 28},
'xaxis': {
'title': 'Repository',
'titlefont': {'size': 24},
'tickfont':{'size': 14},
},
'yaxis':{
'title': 'Stars',
'titlefont': {'size': 24},
'tickfont': {'size': 14},
},
}
fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='python_repos.html')
执行程序,发现x轴的标签名变为小手形状,点击即可进入该python项目的GitHub主页
执行效果见下图:
17.2.4 深入了解Plotly 和GitHubAPI
学习plotly,有两个好去处
- Plotly User Guide in Python
- Plotly网站中的Python Figure Reference
深入了解GitHubAPI,可参阅其文档
17.3 Hacker News API
换一个网站,如Hacker News,调用返回是本热门文章的链接就变为
’‘
https://hacker-news.firebaseio.com/v0/item/19155826.json
’‘
其余略
17.4 小结
略