Day 18 下载数据 及 Web API
-
python常用模块小结
-
CSV数据文件访问分析
- 使用CSV
import csv filename = 'sitka_weather_07-2014.csv' with open(filename) as f: reaer = csv.reader(f) header_row = next(reader)
- enumerate()函数:enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中。
Sample:enumerate(sequence, [start=0])
with open(filename) as f: reader = csv.reader(f) header_row = next(reader) for index, column_header in enumerate(header_row): print (index, column_header)
- 遍历csv文件并提取数据:for + append
with open(filename) as f: reader = csv.reader(f) header_row = next(reader) dates, highs, lows = [], [], [] for row in reader: current_date = datetime.strptime(row[0], "%Y-%m-%d") high = int(row[1]) low = int(row[3]) dates.append(current_date) highs .append(high) lows.append(low)
- 错误处理
with open(filename) as f: reader = csv.reader(f) header_row = next(reader) dates, highs, lows = [], [], [] for row in reader: try: current_date = datetime.strptime(row[0], "%Y-%m-%d") high = int(row[1]) low = int(row[3]) except ValueError: print (current_date, 'missing data') else: dates.append(current_date) highs .append(high) lows.append(low)
-
JSON格式
-
pygal.i18n 不存在,No module named 'pygal.i18n’错误:
- 改用pygal_maps_world.i18n:
- OS X
$ pip install pygal_maps_world
- Windows
\> python -m pip install pygal_maps_world
- OS X
- 将’ from pygal.i18n import COUNTRIES '改为
from pygal_maps_world.i18n import COUNTRIES ```
- 改用pygal_maps_world.i18n:
-
module ‘pygal’ has no attribute ‘Worldmap’ 错误
- 改用‘pygal_maps_world’
import pygal_maps_world.maps wm = pygal_maps_world.maps.World()
- 改用‘pygal_maps_world’
-
-
Web API
-
Web API用于与网站进行交互,请求数据(以JSON或CSV返回)。
-
requests包,让python能向网站请求信息以及检查返回的响应。
- 安装requests包
- OS X
$ pip install --user requests
- Windows
$ python -m pip install --user requests
- 安装requests包
-
处理并响应字典
import requests #执行API调用并存储响应 url = "https://api.github.com/search/repositories?q=language:python&sort=stars" r = requests.get(url) print ("Status code: ", r.status_code) #将API响应存储在一个字典变量中 response_dict = r.json() print ("Total repositories: ", response_dict['total_count']) #探索有关仓库的信息 repo_dicts = response_dict['items'] print ("Repositories returned: " , len(repo_dicts)) #研究第一个仓库 repo_dict = repo_dicts[0] print ("\nKeys:", len(repo_dict)) for key in repo_dict.keys(): print (key)
-
进一步研究‘仓库’
#研究第一个仓库 for repo_dict in repo_dicts: print ("\nSelcted information about first repository: ") print ('Name: ' + repo_dict['name']) print ('Owner: ' , repo_dict['owner']['login']) print ('Start: ' , repo_dict['stargazers_count']) print ('Repository: ', repo_dict['html_url']) print ('Created: ', repo_dict['created_at']) print ('Updated: ', repo_dict['updated_at']) print ('Description: ', repo_dict['description'])
-
‘NoneType’ object has no attribute ‘decode’ 错误:运行下面的代码时出现上述错误:
names, plot_dicts = [], [] for repo_dict in repo_dicts: names.append(repo_dict['name']) plot_dict = { 'value': repo_dict['stargazers_count'], 'label': repo_dict['description'] , } plot_dicts.append(plot_dict) #可视化 my_style = LS('#333366', base_style = LCS) my_config = pygal.Config() my_config.x_label_rotation = 45 my_config.show_legend = False my_config.title_font_size = 24 my_config.label_font_size = 14 my_config.major_label_font_size = 18 my_config.truncate_label = 15 my_config_show_y_guides = False my_config.width = 1000 chart = pygal.Bar(my_config, style = my_style) chart.title = 'Most-starred Python Projects on GitHub' chart.x_labels = names chart.add('', plot_dicts) chart.render_to_file('python_repos.svg')
参考下面两种解决办法:
第一种方法,即:
'label': str(repo_dict['description']),
改为:
'label': str(repo_dict['description']),
既简单又方便。
-
Hacker News API,学习以下三个知识点:
- 根据Web API调用返回的列表,动态生成WEB API调用网址,并再次调用WEB API访问并获取数据;
- 字典的dict.get()函数,不确定某个键是否包含在字典中时,可使用方法dict.get(),它在指定的键存在时返回与之相关的值,在指定的键不存在时返回第二个实参指定的值
- 模块operator中的函数item getter(),以及与sorted()函数的配合使用。这个函数传递键’comments’,它将从这个列表中的每个字典中提取与键’comments’相关的值,函数sorted()将根据这种值对列表进行排序
import requests from operator import itemgetter #执行API调用并存储响应 url = 'https://hacker-news.firebaseio.com/v0/topstories.json' r = requests.get(url) print ('Status code: ', r.status_code) #处理有关每篇文章的信息 submission_ids = r.json() #创建submission_dicts空列表,用于存储热门文章字典 submission_dicts = [] #取前30个热门文章ID for submission_id in submission_ids[:30]: #对于每篇文章,都执行一个API调用 #根据存储在submission_ids列表中的ID生成URL url = ('https://hacker-news.firebaseio.com/v0/item/' + str(submission_id) + '.json') submission_r = requests.get(url) print(submission_r.status_code) response_dict = submission_r.json() #为当前处理的文章生成一个字典 submission_dict = { 'title': response_dict['title'], 'link': 'http://news.ycombinator.com/item?id=' + str(submission_id), 'comments': response_dict.get('descendants', 0) } submission_dicts.append(submission_dict) submission_dicts = sorted(submission_dicts, key = itemgetter('comments'),reverse = True) for submission_dict in submission_dicts: print ('\nTitle: ', submission_dict['title']) print ('Discussion link: ', submission_dict['link']) print ('Comments: ', submission_dict['comments'])
上面这段代码返回的数据结果:
[{"title": "Glitter bomb tricks parcel thieves", "link": "http://news.ycombinator.com/item?id=18706193", "comments": 304}, {"title": "Stop Learning Frameworks", "link": "http://news.ycombinator.com/item?id=18706785", "comments": 175}, {"title": "Reasons Python Sucks", "link": "http://news.ycombinator.com/item?id=18706174", "comments": 175}, {"title": "I need to copy 2000+ DVDs in 3 days. What are my options?", "link": "http://news.ycombinator.com/item?id=18690587", "comments": 167}, {"title": "SpaceX Is Raising $500M at a $30.5B Valuation", "link": "http://news.ycombinator.com/item?id=18706506", "comments": 139}, ......... ]
-