本例用到的库及方法有:json库读取json文件,pandas中value_counts计数,fillna替换缺失值,空字符串替换,counts的plot方法生成图形
plot(kind='barh',stacked=True)堆积条形图,normed_subset=count_subset.div(count_subset.sum(1),axis=0)条形规范化,总计为1
#文件格式为json,python内置json模块可以将json字符串转换成字典对象
import json
path='F:\workspace\python\pydata-book-master\ch02\usagov_bitly_data2012-03-16-1331923249.txt'
records=[json.loads(line) for line in open(path)]
#接下来用pandas中value_counts对时区进行计数
from pandas import DataFrame,Series
import pandas as pd;import numpy as np
frame=DataFrame(records)
frame
frame['tz'][:10]
tz_count=frame['tz'].value_counts()
tz_count[:10]
Out[12]:
Americ