notebook地址jupyter view,Github地址
准备数据
准备数据用以描述颜色分布。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
midwest数据集,预览该数据集并初步可视化:
midwest = pd.read_csv('../../data/midwest_filter.csv')
midwest.head(10)
PID | county | state | area | poptotal | popdensity | popwhite | popblack | popamerindian | popasian | ... | percprof | poppovertyknown | percpovertyknown | percbelowpoverty | percchildbelowpovert | percadultpoverty | percelderlypoverty | inmetro | category | dot_size | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 561 | ADAMS | IL | 0.052 | 66090 | 1270.961540 | 63917 | 1702 | 98 | 249 | ... | 4.355859 | 63628 | 96.274777 | 13.151443 | 18.011717 | 11.009776 | 12.443812 | 0 | AAR | 250.944411 |
1 | 562 | ALEXANDER | IL | 0.014 | 10626 | 759.000000 | 7054 | 3496 | 19 | 48 | ... | 2.870315 | 10529 | 99.087145 | 32.244278 | 45.826514 | 27.385647 | 25.228976 | 0 | LHR | 185.781260 |
2 | 563 | BOND | IL | 0.022 | 14991 | 681.409091 | 14477 | 429 | 35 | 16 | ... | 4.488572 | 14235 | 94.956974 | 12.068844 | 14.036061 | 10.852090 | 12.697410 | 0 | AAR | 175.905385 |
3 | 564 | BOONE | IL | 0.017 | 30806 | 1812.117650 | 29344 | 127 | 46 | 150 | ... | 4.197800 | 30337 | 98.477569 | 7.209019 | 11.179536 | 5.536013 | 6.217047 | 1 | ALU | 319.823487 |
4 | 565 | BROWN | IL | 0.018 | 5836 | 324.222222 | 5264 | 547 | 14 | 5 | ... | 3.367680 | 4815 | 82.505140 | 13.520249 | 13.022889 | 11.143211 | 19.200000 | 0 | AAR | 130.442161 |
5 | 566 | BUREAU | IL | 0.050 | 35688 | 713.760000 | 35157 | 50 | 65 | 195 | ... | 3.275891 | 35107 | 98.372002 | 10.399635 | 14.158819 | 8.179287 | 11.008586 | 0 | AAR | 180.023052 |
6 | 567 | CALHOUN | IL | 0.017 | 5322 | 313.058824 | 5298 | 1 | 8 | 15 | ... | 3.209601 | 5241 | 98.478016 | 15.149781 | 13.787761 | 12.932331 | 21.085271 | 0 | LAR | 129.021269 |
7 | 568 | CARROLL | IL | 0.027 | 16805 | 622.407407 | 16519 | 111 | 30 | 61 | ... | 3.055727 | 16455 | 97.917287 | 11.710726 | 17.225462 | 10.027037 | 9.525052 | 0 | AAR | 168.395572 |
8 | 569 | CASS | IL | 0.024 | 13437 | 559.875000 | 13384 | 16 | 8 | 23 | ... | 3.206799 | 13081 | 97.350599 | 13.875086 | 17.994784 | 11.914343 | 13.660180 | 0 | AAR | 160.436363 |
9 | 571 | CHRISTIAN | IL | 0.042 | 34418 | 819.476190 | 34176 | 82 | 51 | 89 | ... | 3.089998 | 33788 | 98.169562 | 11.708299 | 16.320612 | 9.569700 | 11.490641 | 0 | AAR | 193.478750 |
10 rows × 29 columns
气泡图使用标签颜色
# 将分类category数据作为颜色标记
categories = midwest['category'].unique()
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
# 设计绘图尺寸等细节
fig, ax = plt.subplots(1, 1, figsize=(16, 10), dpi=80)
# 绘制不同分类(category)的气泡图,气泡尺寸取决于dot_size属性
for i, category in enumerate(categories):
data = midwest.loc[midwest['category']==category, :]
ax.scatter('area', 'poptotal', s='dot_size', data=data,
c=[colors[i]]*len(data), edgecolors='black', linewidths=.5, label=category)
ax.set_xlabel('Area', fontsize=14)
ax.set_ylabel('Poptotal', fontsize=14)
ax.set_title('Bubble Plot', loc='left', fontsize=16)
ax.legend(fontsize=12, labelspacing=0.8, borderpad=.7)
plt.show()
实现将Area
、Poptotal
数据作为x、y轴信息,dot_size
作为气泡尺寸信息,category
作为气泡颜色信息。我们思考一个问题,如果气泡颜色展示的不是标签化
的而是连续数值
的信息我们的颜色对应colorbar应该如何制作:
分段colorbar-气泡图
假设我们将percprof
字段信息作为colorbar的对应信息,并设置颜色分段:
midwest['percprof'].describe()
count 332.000000
mean 3.842269
std 1.647157
min 0.520291
25% 2.870991
50% 3.400538
75% 4.318381
max 14.089892
Name: percprof, dtype: float64
根据数据描述信息,我们可以考虑0-15数值分段描述:
from matplotlib.colors import ListedColormap, BoundaryNorm
fig, ax = plt.subplots(1, 1, figsize=(20, 10), dpi=80)
bounds = [0, 2.88, 3.5, 4.5, 12, 15]
colors = ['#7FCAB2', '#EA9F7B', '#9FAED2', '#E99CCB', '#ADD696']
cmap = ListedColormap(colors)
norms = BoundaryNorm(bounds, cmap.N)
fc = ax.scatter('area', 'poptotal', s='dot_size', data=midwest, c='percprof',
cmap=cmap, norm=norms, vmin=0, vmax=15,
edgecolors='black', linewidths=.5)
ax.set_xlabel('Area', fontsize=14)
ax.set_ylabel('Poptotal', fontsize=14)
ax.set_title('Bubble Plot', loc='left', fontsize=16)
cb = plt.colorbar(fc, ax=ax)
cb.set_label('percprof', fontsize=16)
plt.show()
可以看到我们分布数值不是线性的,是基于百分位数绘制颜色的,假设我们需要colorbar上颜色标尺满足线性可以如下设置 spacing{'uniform', 'proportional'}
:
fig, ax = plt.subplots(1, 1, figsize=(20, 10), dpi=80)
bounds = [0, 2.88, 3.5, 4.5, 12, 15]
colors = ['#7FCAB2', '#EA9F7B', '#9FAED2', '#E99CCB', '#ADD696']
cmap = ListedColormap(colors)
norms = BoundaryNorm(bounds, cmap.N)
fc = ax.scatter('area', 'poptotal', s='dot_size', data=midwest, c='percprof',
cmap=cmap, norm=norms, vmin=0, vmax=15,
edgecolors='black', linewidths=.5)
ax.set_xlabel('Area', fontsize=14)
ax.set_ylabel('Poptotal', fontsize=14)
ax.set_title('Bubble Plot', loc='left', fontsize=16)
cb = plt.colorbar(fc, ax=ax, spacing='proportional')
cb.set_label('percprof', fontsize=16)
plt.show()
连续colorbar-气泡图
from matplotlib.colors import LinearSegmentedColormap
fig, ax = plt.subplots(1, 1, figsize=(20, 10), dpi=80)
bounds = [0, 2.88, 3.5, 4.5, 12, 15]
# 由于colormap的值必须在0-1之间所以做了线性分布处理
nodes = [(v-0)/15 for v in bounds]
colors = ['#7FCAB2', '#EA9F7B', '#9FAED2', '#E99CCB', '#ADD696', '#E6CCA4']
cmap = LinearSegmentedColormap.from_list('mymap', list(zip(nodes, colors)))
fc = ax.scatter('area', 'poptotal', s='dot_size', data=midwest, c='percprof',
cmap=cmap, vmin=0, vmax=15,
edgecolors='black', linewidths=.5)
ax.set_xlabel('Area', fontsize=14)
ax.set_ylabel('Poptotal', fontsize=14)
ax.set_title('Bubble Plot', loc='left', fontsize=16)
cb = plt.colorbar(fc, ax=ax)
cb.set_label('percprof', fontsize=16)
plt.show()
上面演示半分位分布的颜色映射,下面我们挑选颜色渐进,按照线性分布(不按百分位数分布),重新绘制下:
fig, ax = plt.subplots(1, 1, figsize=(20, 10), dpi=80)
# 采样简单的colormap线性映射关系
colors = ['#024CEB', '#02BBA9', '#65FF00', '#FEFF00', '#FF8800', '#D40608']
cmap = LinearSegmentedColormap.from_list('mymap', colors)
fc = ax.scatter('area', 'poptotal', s='dot_size', data=midwest, c='percprof',
cmap=cmap, vmin=0, vmax=15,
edgecolors='black', linewidths=.5)
ax.set_xlabel('Area', fontsize=14)
ax.set_ylabel('Poptotal', fontsize=14)
ax.set_title('Bubble Plot', loc='left', fontsize=16)
cb = plt.colorbar(fc, ax=ax)
cb.set_label('percprof', fontsize=16)
plt.show()
可以看到主要颜色在4以下,线性和百分位分段需要自己取舍!