用pandas对时区进行计数
DataFrame是pandas中最重要的数据结构,它用于将数据表示为一个表格。从一组原始记录中创建DataFrame是很简单的:
from pandas import DataFrame,Series
import pandas as pd;import numpy as np
frame = DataFrame(records)
print frame
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3560 entries, 0 to 3559
Data columns:
_heartbeat_ 120 non-null values
a 3440 non-null values
al 3094 non-null values
c 2919 non-null values
cy 2919 non-null values
g 3440 non-null values
gr 2919 non-null values
h 3440 non-null values
hc 3440 non-null values
hh 3440 non-null values
kw 93 non-null values
l 3440 non-null values
ll 2919 non-null values
nk 3440 non-null values
r 3440 non-null values
t 3440 non-null values
tz 3440 non-null values
u 3440 non-null values
dtypes: float64(4), object(14)
接下来:
print frame['tz'][:10]
0 America/New_York
1 America/Denver
2 America/New_York
3 America/Sao_Paulo
4 America/New_York
5 America/New_York
6 Europe/Warsaw
7
8
9
Name: tz
这里frame的输出形式是摘要视图,主要是用于较大的DateFrame对象。Frame[‘tz’]所返回的series对象有一个value_counts方法,该方法可以让我们得到所需的信息:
tz_counts=frame['tz'].value_counts()
print tz_counts
America/New_York 1251
521
America/Chicago 400
America/Los_Angeles 382
America/Denver 191
Europe/London 74
Asia/Tokyo 37
Pacific/Honolulu 36
Europe/Madrid 35
America/Sao_Paulo 33
Europe/Berlin 28
Europe/Rome 27
America/Rainy_River 25
Europe/Amsterdam 22
America/Phoenix 20
America/Indianapolis 20
Europe/Warsaw 16
America/Mexico_City 15
Europe/Stockholm 14
Europe/Paris 14
America/Vancouver 12
Pacific/Auckland 11
Europe/Prague 10
Europe/Oslo 10
Europe/Moscow 10
Europe/Helsinki 10
Asia/Hong_Kong 10
America/Puerto_Rico 10
Asia/Istanbul 9
Asia/Calcutta 9
America/Montreal 9
Europe/Lisbon 8
Europe/Vienna 6
Europe/Athens 6
Chile/Continental 6
Australia/NSW 6
Asia/Bangkok 6
America/Edmonton 6
Europe/Copenhagen 5
Europe/Budapest 5
Asia/Seoul 5
America/Anchorage 5
Europe/Zurich 4
Europe/Bucharest 4
Europe/Brussels 4
Asia/Dubai 4
Asia/Beirut 4
America/Winnipeg 4
America/Halifax 4
Europe/Dublin 3
Europe/Bratislava 3
Asia/Kuala_Lumpur 3
Asia/Karachi 3
Asia/Jerusalem 3
Asia/Jakarta 3
Asia/Harbin 3
America/Managua 3
America/Bogota 3
Africa/Cairo 3
Europe/Vilnius 2
Europe/Riga 2
Europe/Malta 2
Europe/Belgrade 2
Asia/Amman 2
America/Recife 2
America/Guayaquil 2
America/Chihuahua 2
Africa/Ceuta 2
Europe/Volgograd 1
Europe/Uzhgorod 1
Europe/Sofia 1
Europe/Skopje 1
Europe/Ljubljana 1
Australia/Queensland 1
Asia/Yekaterinburg 1
Asia/Riyadh 1
Asia/Pontianak 1
Asia/Novosibirsk 1
Asia/Nicosia 1
Asia/Manila 1
Asia/Kuching 1
America/Tegucigalpa 1
America/St_Kitts 1
America/Santo_Domingo 1
America/Montevideo 1
America/Monterrey 1
America/Mazatlan 1
America/Lima 1
America/La_Paz 1
America/Costa_Rica 1
America/Caracas 1
America/Argentina/Mendoza 1
America/Argentina/Cordoba 1
America/Argentina/Buenos_Aires 1
Africa/Lusaka 1
Africa/Johannesburg 1
Africa/Casablanca 1
Length: 97
然后,我们利用绘图库对这段数据生成一张图片。为此,我们先给记录中未知或缺失的时区填上一个替代值。Fillna函数可以替代缺失值,而未知值即可以通过布尔型数组索引加以替换
clean_tz=frame['tz'].fillna('Missing')
clean_tz[clean_tz=='']='Unknown'
tz_counts=clean_tz.value_counts()
tz_counts[:10]
利用counts对象的plot方法即可得到一张水平条形图:
import matplotlib.pyplot as plt #注意 如果使用pychram编译器 这句一定要写 要不然图出不来
tz_counts[:10].plot(kind='barh',rot=0)
plt.show()