python学习笔记 Day 18 下载数据 及 Web API

Day 18 下载数据 及 Web API

  • python常用模块小结
    python常用模块

  • CSV数据文件访问分析

    • 使用CSV
    import csv
    
    filename = 'sitka_weather_07-2014.csv'
    with open(filename) as f:
    	reaer = csv.reader(f)
    	header_row = next(reader)
    
    • enumerate()函数:enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中。
      enumerate(sequence, [start=0])
      
      Sample:
      	with open(filename) as f:
      		reader = csv.reader(f)
      		header_row = next(reader)
      		for index, column_header in enumerate(header_row):
      			print (index, column_header)
      
    • 遍历csv文件并提取数据:for + append
      with open(filename) as f:
      reader = csv.reader(f)
      header_row = next(reader)
      
      dates, highs, lows = [], [], []
      for row in reader:
      	current_date = datetime.strptime(row[0], "%Y-%m-%d")
      	high = int(row[1])
      	low = int(row[3])
      	dates.append(current_date)
      	highs .append(high)
      	lows.append(low)	
      
    • 错误处理
      with open(filename) as f:
      reader = csv.reader(f)
      header_row = next(reader)
      
      dates, highs, lows = [], [], []
      for row in reader:
      	try:
      		current_date = datetime.strptime(row[0], "%Y-%m-%d")
      		high = int(row[1])
      		low = int(row[3])
      	except ValueError:
      		print (current_date, 'missing data')
      	else:
      		dates.append(current_date)
      		highs .append(high)
      		lows.append(low)
      
  • JSON格式

    • pygal.i18n 不存在,No module named 'pygal.i18n’错误:

      • 改用pygal_maps_world.i18n:
        • OS X
          $ pip install pygal_maps_world
          
        • Windows
          \> python -m pip install pygal_maps_world
          
      • 将’ from pygal.i18n import COUNTRIES '改为
        from pygal_maps_world.i18n import COUNTRIES		```
        
        
    • module ‘pygal’ has no attribute ‘Worldmap’ 错误

      • 改用‘pygal_maps_world’
        import pygal_maps_world.maps
        
        wm = pygal_maps_world.maps.World()
        
  • Web API

    • Web API用于与网站进行交互,请求数据(以JSON或CSV返回)。

    • requests包,让python能向网站请求信息以及检查返回的响应。

      • 安装requests包
        • OS X
      $ pip install --user requests
      
        - Windows
      
      $ python -m pip install --user requests
      
    • 处理并响应字典

      	import requests
      	
      	#执行API调用并存储响应
      	url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
      	r = requests.get(url)
      	print ("Status code: ", r.status_code)
      	
      	#将API响应存储在一个字典变量中
      	response_dict = r.json()
      	print ("Total repositories: ", response_dict['total_count'])
      	
      	#探索有关仓库的信息
      	repo_dicts = response_dict['items']
      	print ("Repositories returned: " , len(repo_dicts))
      	
      	#研究第一个仓库
      	repo_dict = repo_dicts[0]
      	print ("\nKeys:", len(repo_dict))
      	for key in repo_dict.keys():
      		print (key)
      
    • 进一步研究‘仓库’

      	#研究第一个仓库
      	for repo_dict in repo_dicts:
      		print ("\nSelcted information about first repository: ")
      		print ('Name: ' + repo_dict['name'])
      		print ('Owner: ' , repo_dict['owner']['login'])
      		print ('Start: ' , repo_dict['stargazers_count'])
      		print ('Repository: ', repo_dict['html_url'])
      		print ('Created: ', repo_dict['created_at'])
      		print ('Updated: ', repo_dict['updated_at'])
      		print ('Description: ', repo_dict['description'])
      
    • ‘NoneType’ object has no attribute ‘decode’ 错误:运行下面的代码时出现上述错误:

      	names, plot_dicts = [], []
      	for repo_dict in repo_dicts:
      		names.append(repo_dict['name'])
      		plot_dict = {
      			'value': repo_dict['stargazers_count'],
      			'label': repo_dict['description'] ,
      			}
      		plot_dicts.append(plot_dict)
      		
      	#可视化
      	my_style = LS('#333366', base_style = LCS)
      	
      	my_config = pygal.Config()
      	my_config.x_label_rotation = 45
      	my_config.show_legend = False
      	my_config.title_font_size = 24
      	my_config.label_font_size = 14
      	my_config.major_label_font_size = 18
      	my_config.truncate_label = 15
      	my_config_show_y_guides = False
      	my_config.width = 1000
      	
      	chart = pygal.Bar(my_config, style = my_style)
      	chart.title = 'Most-starred Python Projects on GitHub'
      	chart.x_labels = names
      	
      	chart.add('', plot_dicts)
      	chart.render_to_file('python_repos.svg')
      

      参考下面两种解决办法:

      第一种方法,即:

      'label': str(repo_dict['description']),
      

      改为:

      'label': str(repo_dict['description']),
      

      既简单又方便。

    • Hacker News API,学习以下三个知识点:

      • 根据Web API调用返回的列表,动态生成WEB API调用网址,并再次调用WEB API访问并获取数据;
      • 字典的dict.get()函数,不确定某个键是否包含在字典中时,可使用方法dict.get(),它在指定的键存在时返回与之相关的值,在指定的键不存在时返回第二个实参指定的值
      • 模块operator中的函数item getter(),以及与sorted()函数的配合使用。这个函数传递键’comments’,它将从这个列表中的每个字典中提取与键’comments’相关的值,函数sorted()将根据这种值对列表进行排序
      import requests
      from operator import itemgetter
      
      #执行API调用并存储响应
      url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
      r = requests.get(url)
      print ('Status code: ', r.status_code)
      
      #处理有关每篇文章的信息
      submission_ids = r.json()
      #创建submission_dicts空列表,用于存储热门文章字典
      submission_dicts = []
      
      #取前30个热门文章ID
      for submission_id in submission_ids[:30]:
      	#对于每篇文章,都执行一个API调用
      	#根据存储在submission_ids列表中的ID生成URL
      	url = ('https://hacker-news.firebaseio.com/v0/item/' + 
      		str(submission_id) + '.json')
      	submission_r = requests.get(url)
      	print(submission_r.status_code)
      
      	response_dict = submission_r.json()
      
      	#为当前处理的文章生成一个字典	
      	submission_dict = {
      	'title': response_dict['title'],
      	'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
      	'comments': response_dict.get('descendants', 0)
      	}
      	submission_dicts.append(submission_dict)
      
      submission_dicts = sorted(submission_dicts, key = 
      	itemgetter('comments'),reverse = True)
      
      for submission_dict in submission_dicts:
      	print ('\nTitle: ', submission_dict['title'])
      	print ('Discussion link: ', submission_dict['link'])
      	print ('Comments: ', submission_dict['comments'])
      

      上面这段代码返回的数据结果:

      [{"title": "Glitter bomb tricks parcel thieves", 
      "link": "http://news.ycombinator.com/item?id=18706193", 
      "comments": 304}, 
      {"title": "Stop Learning Frameworks", 
      "link": "http://news.ycombinator.com/item?id=18706785", 
      "comments": 175}, 
      {"title": "Reasons Python Sucks", 
      "link": "http://news.ycombinator.com/item?id=18706174", 
      "comments": 175}, 
      {"title": "I need to copy 2000+ DVDs in 3 days. What are my options?", 
      "link": "http://news.ycombinator.com/item?id=18690587", 
      "comments": 167}, 
      {"title": "SpaceX Is Raising $500M at a $30.5B Valuation", 
      "link": "http://news.ycombinator.com/item?id=18706506", 
      "comments": 139}, 
      .........
      ]
      
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

steventian72

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值