高发疾病分布特征分析

高发疾病分布特征分析

仅列举三种高发疾病,对其患病者年龄分布、性别分布、危险因素等进行可视化以探究一些有趣的规律。

一、数据可视化

1.1 不同年龄病别总患病率

# 安装库专用

# 通过如下命令设定镜像
options(repos = 'http://mirrors.ustc.edu.cn/CRAN/')
# 查看镜像是否修改
getOption('repos')
# 尝试下载R包
#若有需要,进行安装
#install.packages('plot3D')
#install.packages("reshape2")
#install.packages("pheatmap")

‘http://mirrors.ustc.edu.cn/CRAN/’

Installing package into 'C:/Users/天涯过客/Documents/R/win-library/4.0'
(as 'lib' is unspecified)



package 'pheatmap' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\天涯过客\AppData\Local\Temp\RtmpAX6TBf\downloaded_packages
library(reshape2)



```R
## 对应分析研究两个分类变量之间详细的依赖关系
library(ca)
zzhdata<-as.data.frame(zzhdata)
head(zzhdata)
A data.frame: 6 × 4
年龄肿瘤患病率糖尿病患病率心血管疾病患病率
<chr><dbl><dbl><dbl>
11~4岁 2.45 0.01 0.05
25~14岁 3.23 0.12 1.15
315~24岁 4.14 1.35 1.63
425~34岁 7.43 5.32 2.50
535~44岁12.9010.67 5.80
645~54岁17.0418.2414.75
rownames(zzhdata)<-zzhdata$年龄
head(zzhdata)
zzh<-zzhdata[,-1]
A data.frame: 6 × 4
年龄肿瘤患病率糖尿病患病率心血管疾病患病率
<chr><dbl><dbl><dbl>
1~4岁1~4岁 2.45 0.01 0.05
5~14岁5~14岁 3.23 0.12 1.15
15~24岁15~24岁 4.14 1.35 1.63
25~34岁25~34岁 7.43 5.32 2.50
35~44岁35~44岁12.9010.67 5.80
45~54岁45~54岁17.0418.2414.75
## 卡方检验判断两个变量是否独立
(result <- chisq.test(zzh))
	Pearson's Chi-squared test

data:  zzh
X-squared = 57.665, df = 14, p-value = 2.999e-07

明显不独立,年龄与得病有关联

## 使用马赛克图进行可视化数据
par(family = "STKaiti")
mosaicplot(zzh,main = "",color = c("red","blue","green","orange"))

在这里插入图片描述

数据中 65岁以上得病的最多,事实是否是这样,可以使用对应分析

## 对应分析分析
smca <- ca(zzh)
summary(smca)
Warning message in abbreviate(rnames.temp, 4):
"abbreviate不适用于非ASCII字元"
Warning message in abbreviate(cnames.temp, 4):
"abbreviate不适用于非ASCII字元"




Principal inertias (eigenvalues):

 dim    value      %   cum%   scree plot               
 1      0.111859  88.1  88.1  **********************   
 2      0.015068  11.9 100.0  ***                      
        -------- -----                                 
 Total: 0.126927 100.0                                 


Rows:
      name   mass  qlt  inr     k=1 cor ctr    k=2 cor ctr  
1 |   14岁 |    6 1000  126 | -1477 756 108 |  839 244 258 |
2 |  514岁 |   10 1000   97 |  -867 606  67 |  700 394 322 |
3 |   1524 |   16 1000   76 |  -721 841  73 |  313 159 102 |
4 |   2534 |   34 1000  121 |  -673 994 136 |  -51   6   6 |
5 |   3544 |   65 1000  171 |  -572 971 189 |  -99  29  42 |
6 |   4554 |  110 1000  111 |  -330 854 107 | -136 146 136 |
7 |   5564 |  175 1000   23 |   -99 580  15 |  -84 420  82 |
8 | 65岁及 |  585 1000  275 |   241 978 305 |   37  22  52 |

Columns:
        name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
1 | 肿瘤患病 |  245 1000  528 | -512 960 575 |  105  40 180 |
2 | 糖尿病患 |  282 1000   96 |  -77 135  15 | -194 865 703 |
3 | 心血管疾 |  472 1000  375 |  312 963 410 |   61  37 118 |
plot(smca,main = "data")

在这里插入图片描述

1.2 不同疾病与危险因素关系

#导入数据
library(readxl)
zzhdata1<-read_excel("zzh.xlsx")
zzhdata1<-as.data.frame(zzhdata1)
head(zzhdata1)
A data.frame: 3 × 10
病别死亡率吸烟吸二手烟饮酒摄入水果含量低摄入蔬菜含量低摄入高含糖饮料过多摄入反式脂肪含量高的饮食摄入钠含量高身体锻炼少
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1因肿瘤死亡率 33.11 2.315.131.720.040.000.00 1.370.39
2因糖尿病人率 12.6310.440.753.970.002.620.00 0.006.86
3因心血管疾病死亡人率17.93 4.134.043.870.320.922.3417.192.77
rownames(zzhdata1)<-zzhdata1$病别死亡率
zzh1<-zzhdata1[,-1]
head(zzh1)
A data.frame: 3 × 9
吸烟吸二手烟饮酒摄入水果含量低摄入蔬菜含量低摄入高含糖饮料过多摄入反式脂肪含量高的饮食摄入钠含量高身体锻炼少
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
因肿瘤死亡率33.11 2.315.131.720.040.000.00 1.370.39
因糖尿病人率12.6310.440.753.970.002.620.00 0.006.86
因心血管疾病死亡人率17.93 4.134.043.870.320.922.3417.192.77
## 卡方检验判断两个变量是否独立
(result <- chisq.test(zzh1))
Warning message in chisq.test(zzh1):
"Chi-squared approximation may be incorrect"




	Pearson's Chi-squared test

data:  zzh1
X-squared = 63.478, df = 16, p-value = 1.344e-07

明显不独立,各危险因素与得病有关

## 使用马赛克图进行可视化数据
par(family = "STKaiti")
mosaicplot(zzh1,main = "",color = c("red","blue","green","orange"))

在这里插入图片描述

## 对应分析分析
smca <- ca(zzh1)
summary(zzh1)
      吸烟          吸二手烟           饮酒       摄入水果含量低 
 Min.   :12.63   Min.   : 2.310   Min.   :0.750   Min.   :1.720  
 1st Qu.:15.28   1st Qu.: 3.220   1st Qu.:2.395   1st Qu.:2.795  
 Median :17.93   Median : 4.130   Median :4.040   Median :3.870  
 Mean   :21.22   Mean   : 5.627   Mean   :3.307   Mean   :3.187  
 3rd Qu.:25.52   3rd Qu.: 7.285   3rd Qu.:4.585   3rd Qu.:3.920  
 Max.   :33.11   Max.   :10.440   Max.   :5.130   Max.   :3.970  
 摄入蔬菜含量低 摄入高含糖饮料过多 摄入反式脂肪含量高的饮食  摄入钠含量高   
 Min.   :0.00   Min.   :0.00       Min.   :0.00             Min.   : 0.000  
 1st Qu.:0.02   1st Qu.:0.46       1st Qu.:0.00             1st Qu.: 0.685  
 Median :0.04   Median :0.92       Median :0.00             Median : 1.370  
 Mean   :0.12   Mean   :1.18       Mean   :0.78             Mean   : 6.187  
 3rd Qu.:0.18   3rd Qu.:1.77       3rd Qu.:1.17             3rd Qu.: 9.280  
 Max.   :0.32   Max.   :2.62       Max.   :2.34             Max.   :17.190  
   身体锻炼少   
 Min.   :0.390  
 1st Qu.:1.580  
 Median :2.770  
 Mean   :3.340  
 3rd Qu.:4.815  
 Max.   :6.860  
plot(smca,main = "data")

在这里插入图片描述

二、省份高发疾病聚类分析

#整理数据结构
zzhmap<-as.data.frame(zzhmap)
rownames(zzhmap)<-zzhmap$地区
head(zzhmap)
A data.frame: 6 × 4
地区肿瘤糖尿病心血管疾病
<chr><dbl><dbl><dbl>
安徽省安徽省177.434010.557792306.7001
北京市北京市122.4444 9.587492254.4389
重庆市重庆市155.9377 9.136834263.5222
福建省福建省184.177713.693854258.7584
甘肃省甘肃省123.3265 9.468219270.6441
广东省广东省141.0060 9.122044265.8059
library(RSNNS)
zzh1<-zzhmap
## 数据max-min归一化到0-1之间
zzhmap[,2:4] <- normalizeData(zzhmap[,2:4] ,"0_1")
## 计算组内平方和  组间平方和
tot_withinss <- vector()
betweenss <- vector()
for(ii in 1:3){
  k1 <- kmeans(zzhmap[,c(2:4)],ii)
  tot_withinss[ii] <- k1$tot.withinss
  betweenss[ii] <- k1$betweenss
}

kmeanvalue <- data.frame(kk = 1:3,
                         tot_withinss = tot_withinss,
                         betweenss = betweenss)

library(ggplot2)
library(gridExtra)
library(ggdendro)
library(cluster)
library(ggfortify)
p1 <- ggplot(kmeanvalue,aes(x = kk,y = tot_withinss))+
  theme_bw()+
  geom_point() + geom_line() +labs(y = "value") +
  ggtitle("Total within-cluster sum of squares")+
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)

p2 <- ggplot(kmeanvalue,aes(x = kk,y = betweenss))+
  theme_bw()+
  geom_point() +geom_line() +labs(y = "value") +
  ggtitle("The between-cluster sum of squares") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)

grid.arrange(p1,p2,nrow=2)
Warning message:
"package 'ggplot2' was built under R version 4.0.4"
Warning message:
"package 'gridExtra' was built under R version 4.0.4"
Warning message:
"package 'ggdendro' was built under R version 4.0.4"
Warning message:
"package 'ggfortify' was built under R version 4.0.4"

在这里插入图片描述

聚为3类试试

set.seed(245)
k3 <- kmeans(zzhmap[,c(2:4)],3)
summary(k3)
             Length Class  Mode   
cluster      33     -none- numeric
centers       9     -none- numeric
totss         1     -none- numeric
withinss      3     -none- numeric
tot.withinss  1     -none- numeric
betweenss     1     -none- numeric
size          3     -none- numeric
iter          1     -none- numeric
ifault        1     -none- numeric
k3
K-means clustering with 3 clusters of sizes 13, 15, 5

Cluster means:
       肿瘤    糖尿病 心血管疾病
1 0.7772742 0.3120611  0.3378636
2 0.5109675 0.3515765  0.6114281
3 0.7988403 0.8529758  0.5135196

Clustering vector:
          安徽省           北京市           重庆市           福建省 
               1                2                1                3 
          甘肃省           广东省   广西壮族自治区           贵州省 
               2                1                1                2 
          海南省           河北省         黑龙江省           河南省 
               1                2                2                2 
            香港           湖北省           湖南省     内蒙古自治区 
               1                2                2                2 
          江苏省           江西省           吉林省           辽宁省 
               1                1                2                3 
            澳门   宁夏回族自治区           青海省           陕西省 
               1                2                3                2 
          山东省           上海市           山西省           四川省 
               1                3                1                1 
          天津市       西藏自治区 新疆维吾尔自治区           云南省 
               2                2                3                2 
          浙江省 
               1 

Within cluster sum of squares by cluster:
[1] 0.8845299 1.2972184 0.6098054
 (between_SS / total_SS =  45.1 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
table(k3$cluster)
 1  2  3 
13 15  5 
## 对聚类结果可视化
clusplot(zzhmap[,2:4],k3$cluster,main = "kmean cluster number=3")

在这里插入图片描述

## 可视化轮廓图,表示聚类效果
sis1 <- silhouette(k3$cluster,dist(zzhmap[,2:4],method = "euclidean"))


plot(sis1,main = " kmean silhouette",
     col = c("red", "green", "blue"))

在这里插入图片描述

#将标签写入
zzh1$cluster<-k3$cluster
zzh1$cluster<-as.factor(zzh1$cluster)
zzh1
A data.frame: 33 × 5
地区肿瘤糖尿病心血管疾病cluster
<chr><dbl><dbl><dbl><fct>
安徽省安徽省 177.4339810.557792306.70011
北京市北京市 122.44437 9.587492254.43892
重庆市重庆市 155.93773 9.136834263.52221
福建省福建省 184.1776713.693854258.75843
甘肃省甘肃省 123.32650 9.468219270.64412
广东省广东省 141.00597 9.122044265.80591
广西壮族自治区广西壮族自治区 162.7869811.077840326.34911
贵州省贵州省 127.34759 9.623574353.27642
海南省海南省 130.03196 6.988748252.50531
河北省河北省 152.20792 9.953459413.76942
黑龙江省黑龙江省 154.56845 8.592230403.36122
河南省河南省 161.2594210.194828375.61722
香港香港 179.37401 7.028075140.18351
湖北省湖北省 143.84933 7.914730322.38322
湖南省湖南省 125.6642210.088037325.22812
内蒙古自治区内蒙古自治区 140.78155 7.733346399.82992
江苏省江苏省 169.28257 9.862698230.79511
江西省江西省 148.75230 7.840917282.56281
吉林省吉林省 136.6291912.264530393.94522
辽宁省辽宁省 170.2183916.819898359.03903
澳门澳门 168.01064 7.336665154.93941
宁夏回族自治区宁夏回族自治区 138.58892 9.340040320.00432
青海省青海省 166.1175312.825846380.02113
陕西省陕西省 130.72657 9.576642342.57102
山东省山东省 166.39694 9.149294334.19791
上海市上海市 136.9924515.868634150.50423
山西省山西省 156.94740 9.088144327.07221
四川省四川省 178.8854510.061232256.66341
天津市天津市 111.4989210.064153259.54942
西藏自治区西藏自治区 74.01227 5.202867476.80352
新疆维吾尔自治区新疆维吾尔自治区152.5780916.351338416.89983
云南省云南省 111.64295 9.702974278.61502
浙江省浙江省 140.48702 7.514895159.60051
## 可视化多个图像窗口
par(family = "STKaiti",mfrow=c(2,2))
layout(matrix(c(1,2,3,3),2,2,byrow = TRUE))
hist(zzh1$肿瘤,breaks = 10,col = "lightblue",main = "肿瘤死亡率",xlab = "肿瘤死亡率")
smoothScatter(zzh1$肿瘤,zzh1$心血管疾病, nbin = 84,main = "肿瘤与心血管疾病散点图",xlab = "肿瘤",ylab = "心血管疾病")
## 添加第3个图像
boxplot(糖尿病~cluster,data = zzh1,main = "糖尿病各类别箱线图",ylab = "糖尿病死亡率",col='red')

在这里插入图片描述

#使用矩阵散点图分析变量两两之间的关系
library(GGally)
ggscatmat(data = zzh1[,2:5],columns = 1:4,color = "cluster",alpha = 0.8)+
  theme_bw(base_family = "STKaiti",base_size = 10)+
  theme(plot.title = element_text(hjust = 0.5))+
  ggtitle("矩阵散点图")
Warning message:
"package 'GGally' was built under R version 4.0.4"
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

在这里插入图片描述

#心血管疾病各省份有较大的差异,而其他两个疾病,差异不太大

地图可视化

三、相关关系可视化

 ## 可视化各地区疾病死亡率分布
 
ggplot(zzh1,aes(x=reorder(地区,肿瘤),y=肿瘤))+
  theme_bw(base_family = "STKaiti")+
  geom_bar(aes(fill=肿瘤),stat = "identity",show.legend = F)+
  coord_flip()+
  scale_fill_gradient(low = "#56B1F7", high = "#132B43")+
  labs(x="地区",y="肿瘤死亡率",title="不同地区肿瘤死亡率")+
  theme(axis.text.x = element_text(vjust = 0.5),
        plot.title = element_text(hjust = 0.5))

在这里插入图片描述

library(treemap)
treemap(zzh1,index = c("cluster","地区"),vSize = "肿瘤",
        vColor = "心血管疾病",type="value",palette="RdYlGn",
        title = "不同分类下各地区肿瘤和心血管疾病死亡率(每10万人)",fontfamily.title = "STKaiti",
        title.legend = "心血管疾病死亡率(每10万人)",fontfamily.legend="STKaiti")

在这里插入图片描述

library(reshape2)
mydata<-melt(zzhmap,id.vars=c("地区"),variable.name="type",value.name="number")
library(ggthemes)

g <- ggplot(mydata, aes(type, number))

g + geom_boxplot(aes(fill=factor(type))) +

theme(axis.text.x = element_text(angle=65, vjust=0.6)) +

labs(title="Box plot",

caption="Source: mpg",

x="疾病类型",

y="死亡率(每10万人)")

在这里插入图片描述

四、时间序列预测(仅举个例)


#设置工作路径
setwd("D:/LengPY")

#导入数据
library(readxl)
zzhtime<-read_excel("zzhtime.xlsx")
head(zzhtime)
A tibble: 6 × 4
年份肿瘤患病数糖尿病患病数心血管疾病患病数
<dbl><dbl><dbl><dbl>
20107232465976781094 89573623
20117385317179596503 92559905
20127531173782982973 95781404
20137678629186551170 99166354
20147839116689875882102751559
20157985653492381953106171194
zzhtime<-as.data.frame(zzhtime)
rownames(zzhtime)<-zzhtime$年份
zzhtime
A data.frame: 10 × 4
年份肿瘤患病数糖尿病患病数心血管疾病患病数
<dbl><dbl><dbl><dbl>
201020107232465976781094 89573623
201120117385317179596503 92559905
201220127531173782982973 95781404
201320137678629186551170 99166354
201420147839116689875882102751559
201520157985653492381953106171194
201620168139370192191000109763513
201720178327129990886585113270798
201820188513099091078791116709173
201920198692340091976596120332012
# display_numbers = TRUE参数设定在每个热图格子中显示相应的数值,number_color参数设置数值字体的颜色
pheatmap(zzhtime, display_numbers = TRUE,number_color = "blue")

在这里插入图片描述

library(reshape2)
zzh<-melt(zzhtime,id.vars=c("年份"),variable.name="type",value.name="number")
zzh
A data.frame: 30 × 3
年份typenumber
<dbl><fct><dbl>
2010肿瘤患病数 72324659
2011肿瘤患病数 73853171
2012肿瘤患病数 75311737
2013肿瘤患病数 76786291
2014肿瘤患病数 78391166
2015肿瘤患病数 79856534
2016肿瘤患病数 81393701
2017肿瘤患病数 83271299
2018肿瘤患病数 85130990
2019肿瘤患病数 86923400
2010糖尿病患病数 76781094
2011糖尿病患病数 79596503
2012糖尿病患病数 82982973
2013糖尿病患病数 86551170
2014糖尿病患病数 89875882
2015糖尿病患病数 92381953
2016糖尿病患病数 92191000
2017糖尿病患病数 90886585
2018糖尿病患病数 91078791
2019糖尿病患病数 91976596
2010心血管疾病患病数 89573623
2011心血管疾病患病数 92559905
2012心血管疾病患病数 95781404
2013心血管疾病患病数 99166354
2014心血管疾病患病数102751559
2015心血管疾病患病数106171194
2016心血管疾病患病数109763513
2017心血管疾病患病数113270798
2018心血管疾病患病数116709173
2019心血管疾病患病数120332012
p <- ggplot(zzh,aes(x=年份,y=number,colour=type,group=type,fill=type)) +
			geom_line(size =0.8)
p

在这里插入图片描述

library(ggfortify)
library(gridExtra)
library(forecast)
library(tseries)
Warning message:
"package 'tseries' was built under R version 4.0.4"
ARMAdata <- ts(zzhtime$肿瘤患病数)
plot.ts(ARMAdata)

在这里插入图片描述

autoplot(ARMAdata)+ggtitle("序列变化趋势")

在这里插入图片描述

## 白噪声检验
Box.test(ARMAdata,type ="Ljung-Box")
	Box-Ljung test

data:  ARMAdata
X-squared = 6.4337, df = 1, p-value = 0.0112
## p-value <0.05 ,说明不是白噪声

## 平稳性检验,单位根检验
adf.test(ARMAdata)
	Augmented Dickey-Fuller Test

data:  ARMAdata
Dickey-Fuller = 2.3927, Lag order = 2, p-value = 0.99
alternative hypothesis: stationary

p-value = 0.99,说明数据是不平稳的

## R提供了自动寻找序列合适的参数的函数
auto.arima(ARMAdata)
Series: ARMAdata 
ARIMA(0,2,0) 

sigma^2 estimated as 2.216e+10:  log likelihood=-106.41
AIC=214.82   AICc=215.49   BIC=214.9
## 可以发现较好的ARMA模型为ARMA(2,1)

## 对数据建立ARIMA(0,2,0) 模型,并预测后面的数据
ARMAmod <- arima(ARMAdata,order = c(0,2,0))
summary(ARMAmod)
Call:
arima(x = ARMAdata, order = c(0, 2, 0))


sigma^2 estimated as 2.094e+10:  log likelihood = -106.41,  aic = 214.82

Training set error measures:
                   ME     RMSE      MAE        MPE      MAPE      MASE
Training set 20262.74 133154.1 97914.24 0.02360526 0.1221592 0.0603633
                    ACF1
Training set -0.05418788
## 对拟合残差进行白噪声检验
Box.test(ARMAmod$residuals,type ="Ljung-Box")
	Box-Ljung test

data:  ARMAmod$residuals
X-squared = 0.039151, df = 1, p-value = 0.8431
## p-value = 0.8431 ,说明是白噪声

## 可视化模型未来的预测值
par(family = "STKaiti")
plot(forecast(ARMAmod,h=5))

在这里插入图片描述

forecast(ARMAmod,h=5)
   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
11       88715811 88530378 88901243 88432216 88999405
12       90508221 90093581 90922860 89874084 91142357
13       92300631 91606806 92994455 91239518 93361744
14       94093041 93077386 95108696 92539730 95646351
15       95885451 94510247 97260654 93782259 97988643

五、不同性别人口结构图

zzhdata
A tibble: 8 × 3
年龄男性肿瘤患病率女性肿瘤患病率
<chr><dbl><dbl>
1~4岁 -1863.878 2242.352
5~14岁 -2417.117 3143.431
15~24岁 -2128.797 5847.729
25~34岁 -3257.57711319.571
35~44岁 -6452.43119250.083
45~54岁 -11316.96222654.410
55~64岁 -19488.94122559.742
65岁及以上-51925.85936035.886
zzhdata$group<-c(1:8)
head(zzhdata)
A tibble: 6 × 4
年龄男性肿瘤患病率女性肿瘤患病率group
<chr><dbl><dbl><int>
1~4岁 -1863.878 2242.3521
5~14岁 -2417.117 3143.4312
15~24岁 -2128.797 5847.7293
25~34岁 -3257.57711319.5714
35~44岁 -6452.43119250.0835
45~54岁-11316.96222654.4106
colnames(zzhdata)<-c('group_name','male','female','group')
data_use<-zzhdata
### 人口金字塔
library(ggplot2)
library(grid)
## 作图
# 函数
vplayout <- function(x, y){
  viewport(layout.pos.row = x, layout.pos.col = y)
}
# 参数
v_max <- 52000
dig_temp <- nchar(as.character(v_max))
lim_1 <- c(-v_max, 0)
lim_2 <- c(0, v_max)
by <- round(v_max/(5 * 10^(dig_temp - 2))) * 10^(dig_temp - 2)
bre_1 <- seq(from = 0, to = -v_max, by = -by)
bre_2 <- seq(from = 0, to = v_max, by = by)
lab_1 <- seq(from = 0, to = v_max, by = by)
lab_2 <- seq(from = 0, to = v_max, by = by)
mg_1 <- unit(c(0, 0.0, 0.3, 0.5), "lines")  # 上右下左
mg_2 <- unit(c(0, 0.5, 0.3, 0.0), "lines")
mg_3 <- unit(c(0, 0.0, 2.3, 0.0), "lines")
mg_l <- margin(0, 0, 0, 0, 'lines')
text_x <- rep(2, nrow(data_use))
text_y <- 1:nrow(data_use)
text_lab <- gsub('岁', '', data_use$group_name)
text_lab <- gsub('及以上', '+', text_lab)
title_x <- 2
title_y <- 2
title_lab <- '肿瘤患病年龄结构'
# 图形
p_1 <- ggplot(data_use) +
  geom_bar(aes(group, male), fill = 'skyblue', stat="identity", position="dodge") +
  scale_y_continuous(limits = lim_1, breaks = bre_1, labels = lab_1) +
  scale_x_continuous(limits = c(0, (nrow(data_use) + 1)), breaks = 1:nrow(data_use), labels = NULL, expand = expand_scale(), position = 'top') +
  theme(plot.margin = mg_1, axis.text = element_text(margin = mg_l)) +
  xlab(NULL) +
  ylab('男') +
  coord_flip() + 
  guides(fill = FALSE)
p_2 <- ggplot(data_use) +
  geom_bar(aes(group, female), fill = 'firebrick1', stat="identity", position="dodge") +
  scale_y_continuous(limits = lim_2, breaks = bre_2, labels = lab_2) +
  scale_x_continuous(limits = c(0, (nrow(data_use) + 1)), breaks = 1:nrow(data_use), labels = NULL, expand = expand_scale()) +
  theme(plot.margin = mg_2, axis.text = element_text(margin = mg_l)) +
  xlab(NULL) +
  ylab('女') +
  coord_flip() +
  guides(fill = FALSE)
p_3 <- ggplot() + 
  geom_text(aes(x = text_x, y= text_y, label = text_lab), size = 3.6) +
  scale_x_continuous(limits = c(0, 4), breaks = NULL, expand = expand_scale()) +
  scale_y_continuous(limits = c(0, (nrow(data_use) + 1)), breaks = NULL, expand = expand_scale()) +
  labs(x = NULL, y = NULL) + 
  theme(plot.margin = mg_3,
        axis.text = element_text(margin = mg_l),
        panel.grid.major =element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank()) 
p_4 <- ggplot() + 
  geom_text(aes(x = title_x, y= title_y, label = title_lab), size = 6) +
  scale_x_continuous(limits = c(0, 4), breaks = NULL, expand = expand_scale()) +
  scale_y_continuous(limits = c(0, 4), breaks = NULL, expand = expand_scale()) +
  labs(x = NULL, y = NULL) + 
  theme(plot.margin = mg_l,
        axis.text = element_text(margin = mg_l),
        panel.grid.major =element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank()) 
# 分面画图
grid.newpage()  ##新建页面
pushViewport(viewport(layout = grid.layout(12, 11))) 
print(p_1, vp = vplayout(2:12, 1:5))
print(p_2, vp = vplayout(2:12, 7:11))
print(p_3, vp = vplayout(2:12, 6))
print(p_4, vp = vplayout(1, 1:11))
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."
Warning message:
"`expand_scale()` is deprecated; use `expansion()` instead."

在这里插入图片描述

  • 3
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 6
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值