自定义函数计算每个类型出现的次数
统计词频
方案一def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
方案二from collections import defaultdict
def get_counts2(sequence):
counts = defaultdict(int)
for x in sequence:
counts[x] += 1
return counts
方案三:dataframe格式的value_counts()tz_counts = frame['tz'].value_counts()
tz_counts[:10]
从效率上来说,我更推荐方案二。
案例思路来源
似乎感觉到这是某个案例的拓展。对了,是判断一个元素是否在一个列表中。引入字典模式,就可用在计算频次上。def check(x,L):
if x in L:
return True
else:
L.append(x)
统计top N类型
方案一:自定义函数def top_counts(count_dict, n):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
方案二:用函数from collections import Counter
counts = Counter(time_zones)
counts.most_common(n)
未完待续