把数据分成多少组进行统计
注意:组数要适当,太少会有较大的统计误差,太多规律不明显
组数:将数据分组,当数据在100个以内时,按数据多少需分5-12组
组距:指每个小组的两个端点的距离,公式如下:
组数=极差/组距
应用场景
- 用户的年龄分布状态
- 一段时间内用户的点击次数的分布状态
- 用户活跃时间的分布状态
案例
现有250个数据,分别是电影的时间长度,统计每个时间段内的电影数或出现的频率
# import random
# print([random.randint(70, 200) for i in range(250)])
from matplotlib import pyplot as plt
from matplotlib import font_manager
my_font = font_manager.FontProperties(fname='/usr/share/fonts/cjkuni-uming/uming.ttc')
a = [154, 191, 101, 153, 95, 118, 184, 98, 159, 120, 191, 159, 115, 172, 131, 172, 191, 102, 180, 139, 189, 170, 174,
188, 111, 132, 148, 74, 195, 151, 92, 181, 172, 124, 149, 151, 172, 139, 111, 124, 152, 167, 167, 185, 195, 170,
172, 147, 175, 154, 151, 182, 169, 91, 136, 113, 112, 70, 153, 82, 148, 110, 178, 194, 87, 133, 148, 180, 151,
173, 127, 148, 186, 197, 162, 138, 196, 150, 103, 76, 130, 78, 71, 128, 187, 91, 90, 161, 72, 112, 98, 190, 93,
182, 182, 93, 150, 138, 76, 135, 187, 196, 169, 134, 129, 151, 146, 109, 152, 88, 119, 100, 120, 122, 119, 182,
95, 183, 110, 181, 81, 160, 138, 89, 97, 166, 182, 127, 108, 87, 158, 73, 88, 162, 105, 128, 79, 79, 193, 162,
181, 128, 130, 145, 129, 111, 87, 169, 87, 105, 86, 92, 149, 80, 106, 198, 188, 140, 179, 149, 125, 163, 95, 131,
185, 187, 143, 82, 193, 148, 157, 179, 146, 107, 82, 87, 75, 98, 75, 112, 102, 163, 152, 112, 160, 129, 84, 186,
161, 200, 196, 93, 141, 86, 95, 83, 93, 112, 127, 149, 109, 145, 92, 130, 195, 85, 178, 175, 194, 170, 170, 177,
180, 158, 148, 93, 190, 134, 72, 158, 156, 180, 146, 80, 156, 86, 70, 163, 172, 185, 183, 77, 132, 107, 167, 173,
134, 130, 167, 96, 118, 189, 82, 170, 118, 119, 168, 82, 174, 119]
plt.figure(figsize=(20, 8), dpi=80)
# 计算组数
d = 5 # 组距
num_bins = (max(a)-min(a))//d
plt.hist(a, num_bins)
# 设置x轴的刻度
plt.xticks(range(min(a), max(a)+d, d))
plt.grid()
plt.show()
如果计算出现的频率,只需改动一处
plt.hist(a, num_bins, normed=True)