前言
有时候我们会有一些数据,变量是非数值类型的分类变量,如性别、城市、婚姻状况等。它可能是成对出现,并且往往是多对多的,如性别和城市就是多对多的。我们要处理这类数据,就需要按照某一个变量分类,再进行统计,才能更直观的观察数据。
一、原数据数据张这样
可以看到变量之间是多对多的关系。marriage是婚姻状况。假如我们想统计不同城市的婚姻状况的比例,就需要对数据作分组统计处理。可以如下进行操作。
二、处理步骤代码
1、代码
代码如下:
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
if __name__=="__main__":
data = {"city":["tianjin","beijing","beijing","shanghai","shanghai","beijing","shanghai","tianjin","shanghai","tianjin","tianjin","shanghai","beijing","shanghai","beijing"],
"marriage":["un","ma","div","ma","un","div","un","ma","ma","div","un","div","un","div","un"]}
dataframe = pd.DataFrame(data)
print(dataframe)
dg = dataframe.groupby("marriage")
stand = pd.DataFrame()
for i,j in dg:
stand = pd.concat([stand,j["city"].value_counts(sort="False")],axis=1)
stand.rename(columns={'city':i},inplace=True)
print(stand)
stand.plot.bar(width=0.1,rot=0,stacked=True)
plt.title("Chart of city-marriage")
plt.ylabel('marriage-Count') # 纵坐标轴标题
plt.xlabel('city') # 纵坐标轴标题
plt.legend(loc=0,prop={'size':10}) # 显示图例
plt.show()
2.结果
作堆叠图是这样,这样可以看出一个城市的不同婚姻状况的比例如何。