python astype category_python类别比较型图表柱形图系列百分比堆积柱形图

最新推荐文章于 2024-07-21 15:31:31 发布

weixin_39915721

最新推荐文章于 2024-07-21 15:31:31 发布

阅读量636

点赞数

文章标签： python astype category python计算百分比

百分比堆积柱形图

比较各个类别的每一个数值所占总数值的百分比大小；以二维垂直百分比堆积矩形显示数值；

柱形图的X轴变量一般为类别型，Y轴变量为数值型；
- 先求出重点想展示类别的占比，通常选占比最大的数据系列；
- 再对数据进行降序处理；
图例变量属于序数型，即为有序型，需要按顺序显示图例；
图例变量属于无序型，最好根据其平均占比排序;
- 使占比大的类别放置在最下面，最靠近X轴;
- 更容易观察每个类别间的变量占比变化;

plotnine包绘制百分比堆积柱形图

在plotnine包中使用geom_bar() 函数的position参数设置为fill，即可绘制百分比堆积柱形图；

import pandas as pd

import numpy as np

from plotnine import *

df=pd.read_csv('d:\python\out\StackedCD.csv')

SumCol_df=df.iloc[:,1:].apply(lambda x: x.sum(), axis=0)

df.iloc[:,1:]=df.iloc[:,1:].apply(lambda x: x/SumCol_df, axis=1)

meanRow_df=df.iloc[:,1:].apply(lambda x: x.mean(), axis=1)

Per_df=df.iloc[meanRow_df.idxmax(),1:].sort_values(ascending=False)

Sing_df=df['Clarity'][meanRow_df.sort_values(ascending=True).index]

mydata=pd.melt(df,id_vars='Clarity')

mydata['Clarity']=mydata['Clarity'].astype("category",Sing_df)

mydata['variable']=mydata['variable'].astype("category",Per_df.index)

base_plot=(ggplot(mydata,aes(x='variable',y='value',fill='Clarity'))

+geom_bar(stat="identity", color="black", position='fill',width=0.7,size=0.25)

+scale_fill_brewer(palette="GnBu")

+theme(

legend_title=element_text(size=18,face="plain",color="black"),

legend_text=element_text(size=16,face="plain",color="black"),

axis_title=element_text(size=18,face="plain",color="black"),

axis_text = element_text(size=16,face="plain",color="black"),

aspect_ratio =1.15,

figure_size = (6.5, 6.5),

dpi = 50

)

)

print(base_plot)

base_plot.save('d:\python\out\Bar_Plot4.pdf')

matplotlib包绘制百分比堆积柱形图

在matplotlib包中使用plt.bar()函数绘制百分比柱形图；- 先计算多数据列的数据，转换成每个类别的百分比数据；- 在依次使用plt.bar()函数绘制每个数据序列，而且需要设置bottom参数；- 需要设置bottom参数(前几个数据系列的累加数值)；- 还需要设置Y轴的标签格式为百分比形式；

from matplotlib import cm,colors

from matplotlib import pyplot as plt

from matplotlib.pyplot import figure, show, rc

import numpy as np

import pandas as pd

plt.rcParams["font.sans-serif"]='SimHei' #解决中文乱码问题

plt.rcParams['axes.unicode_minus']=False #解决负号无法显示的问题

plt.rc('axes',axisbelow=True)

df=pd.read_csv('d:\python\out\StackedCD.csv')

df=df.set_index("Clarity")

SumCol_df=df.apply(lambda x: x.sum(), axis=0)

df=df.apply(lambda x: x/SumCol_df, axis=1)

meanRow_df=df.apply(lambda x: x.mean(), axis=1)

Per_df=df.loc[meanRow_df.idxmax(),:].sort_values(ascending=False)

Sing_df=meanRow_df.sort_values(ascending=False).index

df=df.loc[:,Per_df.index]

n_row,n_col=df.shape

x_value=np.arange(n_col)

cmap=cm.get_cmap('YlOrRd_r',n_row)

color=[colors.rgb2hex(cmap(i)[:3]) for i in range(cmap.N) ]

bottom_y=np.zeros(n_col)

fig=plt.figure(figsize=(5,5))

#plt.subplots_adjust(left=0.1, right=0.9, top=0.7, bottom=0.1)

for i in range(n_row):

label=Sing_df[i]

plt.bar(x_value,df.loc[label,:],bottom=bottom_y,width=0.5,color=color[i],label=label,edgecolor='k', linewidth=0.25)

bottom_y=bottom_y+df.loc[label,:].values

plt.xticks(x_value,df.columns,size=10) #设置x轴刻度

plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])

plt.legend(loc=(1,0.3),ncol=1,frameon=False)

plt.grid(axis="y",c=(166/256,166/256,166/256))

ax = plt.gca() #获取整个表格边框

ax.spines['top'].set_color('none') # 设置上‘脊梁’为无色

ax.spines['right'].set_color('none') # 设置右‘脊梁’为无色

ax.spines['left'].set_color('none') # 设置左‘脊梁’为无色

weixin_39915721

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。