数据:
bit.ly 的 1.usa.gov数据
读取:
### read data
import json
path = 'pydata-book-master/ch02/usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]
任务:
主要使用python代码和pandas库等方式来计算各时区对应的数量,最后尝试用plot来绘制top10时区对应的数据。
方式一:自定义python函数
### own functions
# func1: count each time_zone
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
# func2: count each time_zone
from collections import defaultd