提示:本文章是记录本人看《R语言医学数据分析实战》记录的对自己有用的知识
文章目录
一、R语言介绍
1.常见用法
(1)下载包:
install.package("xxxx")
(2)获取帮助
help.xxxx()
?xxxx
(3)设置工作目录:
setwd("xxxx/xxx/xxx")
#保存工作空间映像:
save.image("xxxx")
二、创建数据集
1.R的数据结构
R的数据结构:向量、因子、矩阵、数组和列表。
1)向量:使用c()来创建。
> x2<-1:5
> x2
[1] 1 2 3 4 5
> x1<-seq(from=2,to=10,by=2)
> x1
[1] 2 4 6 8 10
> x3<-rep('a',times=4)
> x3
[1] "a" "a" "a" "a"
> x4<-seq(from=3,to=100,by=7)
> x4
[1] 3 10 17 24 31 38 45 52 59 66 73 80 87 94
> x4[-(1:3)]
[1] 24 31 38 45 52 59 66 73 80 87 94
常用方法:
length(x):求元素个数; quantile(x):求x的分位数; scale(x):将x标准化。
2)因子:使用factor()来创建。
> sex<-c(1,2,2,1,2)
> sex.f<-factor(sex,levels=c(1,2),labels = c("Male","Female"))
> sex.f
[1] Male Female Female Male Female
Levels: Male Female
> levels(sex.f)
[1] "Male" "Female"
3)矩阵:使用matrix()来创建。
> M<-matrix(1:6,nrow=2)
> M
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> M1<-matrix(5:10,nrow = 3)
> M1
[,1] [,2]
[1,] 5 8
[2,] 6 9
[3,] 7 10
> dim(M) #求矩阵维数
[1] 2 3
> dim(M1)
[1] 3 2
> M %*% M1 #矩阵相乘
[,1] [,2]
[1,] 58 85
[2,] 76 112
> t(M1) #行列式转置
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
> M2<-matrix(1:4,nrow = 2)
> det(M2) #行列式的值
[1] -2
> solve(M2) #逆矩阵
[,1] [,2]
[1,] -2 1.5
[2,] 1 -0.5
> rowSums(M1)
[1] 13 15 17
> rowMeans(M1)
[1] 6.5 7.5 8.5
> M1[1:2,1:2] #取矩阵的前两行和前两列
[,1] [,2]
[1,] 5 8
[2,] 6 9
> M1[,1:1]
[1] 5 6 7
4)数组:使用array()来创建数组和dim()来给一个向量加上维数后定义一个数组。
> A<-1:24
> dim(A)<-c(3,4,2)
> A
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
> dim1<-c("A1","A2","A3")
> dim2<-c("B1","B2","B3","B4")
> dim3<-c("C1","C2")
> array(1:24,dim=c(3,4,2),dimnames=list(dim1,dim2,dim3))
, , C1
B1 B2 B3 B4
A1 1 4 7 10
A2 2 5 8 11
A3 3 6 9 12
, , C2
B1 B2 B3 B4
A1 13 16 19 22
A2 14 17 20 23
A3 15 18 21 24
5)列表:使用list()来创建列表。
> list1 <- list(a = 1, b = 1:5, c = c("red", "blue", "green"))
> list1
$a
[1] 1
$b
[1] 1 2 3 4 5
$c
[1] "red" "blue" "green"
> set.seed(123) #设置随机数种子,以实现重复
> dat <- rnorm(10) #从标准正态分布中生成由10个数组成的随机样本
> bp <- boxplot(dat)
> class(bp)
[1] "list"
> bp
$stats
[,1]
[1,] -1.26506123
[2,] -0.56047565
[3,] -0.07983455
[4,] 0.46091621
[5,] 1.71506499
$n
[1] 10
$conf
[,1]
[1,] -0.5901626
[2,] 0.4304935
$out
numeric(0)
$group
numeric(0)
$names
[1] "1"
> bp$stats
[,1]
[1,] -1.26506123
[2,] -0.56047565
[3,] -0.07983455
[4,] 0.46091621
[5,] 1.71506499
6)数据框:使用data.frame()来创建列表。
> ID<-1:5
> age<-c(25,34,38,28,52)
> sex<-c("male", "female", "male", "female", "male")
> pain<-c(1,2,3,2,3)
> pain.f<-factor(pain,levels = 1:3,labels = c("mild","medium","severe"))
> patients<-data.frame(ID,sex,age,pain.f)
> patients
ID sex age pain.f
1 1 male 25 mild
2 2 female 34 medium
3 3 male 38 severe
4 4 female 28 medium
5 5 male 52 severe
7)数据判断与转换:使用is.数据类型()来判断数据 ,使用as.数据类型()来转换数据。
2.获取数据
1)获取内置数据集:其中包含将近100个数据集。
> data(package="datasets")
2)模拟特定分布的数据
> r1 <- rnorm(n = 100, mean = 0, sd = 1) #服从正态分布的随机数
> r2 <- runif(n = 10000, min = 0, max = 100) #服从均匀分布的随机数
> r3 <- rbinom(n = 80, size = 100, prob = 0.1) #服从二项分布的随机数
> r4 <- rpois(n = 50, lambda = 1) #服从泊松分布的随机数
3)获取不同文件格式的数据
1)txt与csv格式
> patient.data<-read.table("patients.txt",header = TRUE)
> patient.data
ID sex age pain.f
1 1 male 25 mild
2 2 female 34 severe
3 3 male 38 medium
4 4 female 28 medium
5 5 male 52 severe
> patient.data1<-read.csv("patients.csv",header = TRUE)
> patient.data1
ID sex age pain.f
1 1 male 25 mild
2 2 female 34 severe
3 3 male 38 medium
4 4 female 28 medium
5 5 male 52 severe
2)xls与xlsx格式:借助第三方包(openxlsx、readxl与gdata)
> library(openxlsx)
> patient.data2<-read.xlsx("patients.xlsx",sheet=1)
> patient.data2
ID sex age pain.f
1 1 male 25 mild
2 2 female 34 severe
3 3 male 38 medium
4 4 female 28 medium
5 5 male 52 severe
2)其他软件SAS、Stata产生的格式
> library(foreign)
> patients.data <- read.spss("patients.sav", to.data.frame = TRUE)
> patients.data
ID sex age pain.f
1 1 male 25 mild
2 2 female 34 severe
3 3 male 38 medium
4 4 female 28 medium
5 5 male 52 severe
> View(patients.data)
4)导出数据
write.csv(patient.data,file = "patient_data.csv")
save(patient.data,file = "patient_data.rdata") #保存为R数据文件
load("patient_data.rdata")
5)使用rio包导入和导出数据
> library(rio)
> data("infert")
> str(infert)
'data.frame': 248 obs. of 8 variables:
$ education : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 1 1 1 1 2 2 2 2 2 2 ...
$ age : num 26 42 39 34 35 36 23 32 21 28 ...
$ parity : num 6 1 6 4 3 4 1 2 1 2 ...
$ induced : num 1 1 2 2 1 2 0 0 0 0 ...
$ case : num 1 1 1 1 1 1 1 1 1 1 ...
$ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ...
$ stratum : int 1 2 3 4 5 6 7 8 9 10 ...
$ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ...
> export(infert, "infert.csv")
> convert("infert.csv", "infert.sav")
> infert.data <- import("infert.sav")
> infert.data
education age parity induced case spontaneous stratum pooled.stratum
1 0-5yrs 26 6 1 1 2 1 3
2 0-5yrs 42 1 1 1 0 2 1
3 0-5yrs 39 6 2 1 0 3 4