# R语言进行数据的重新编码(recode)

1. 使用逻辑判断式编码。
2. 使用cut函数编码。
3. 使用car程序包的recode函数。

（一）使用逻辑判断式
（1）现假设我们需要将下面的连续型变量x按照10与20分成三个组，新的分组名称为1、2、3：

> x2=1*(x<=10)+2*(x>10&x<=20)+3*(x>20)
> x2
[1] 1 2 3 2 3 3 3 3 1 3 3 2 1 2 3 3 3 2 3 3

• 将上述变量的数字编码改为字符编码
> labels=c("A","B","C")
> x3=labels[x2]
> x3
[1] "A" "B" "C" "B" "C" "C" "C" "C" "A" "C" "C" "B" "A" "B" "C" "C" "C" "B" "C" "C"

• 假设如下将以下范例月收入数据分成“低收入”，“中等收入”，“高收入”三个组：
> income<-c(130065,82961,133076,123028,108945,173466,17477)
> income
[1] 130065  82961 133076 123028 108945 173466  17477
> newcodes=c("低收入"，"中等收入","高收入")
Error: unexpected input in "newcodes=c("低收入"?
> newcodes=c("低收入","中等收入","高收入")
> index=1*(income<20000)+2*(income>=20000&income<=60000)+3*(income>60000)
> income=newcodes[index]
> income
[1] "高收入" "高收入" "高收入" "高收入" "高收入" "高收入" "低收入"


（2）使用ifelse函数

> x
[1]  4 12 50 18 50 22 23 46  8 46 36 18 10 14 35 48 23 17 29 30
> (x2=ifelse(x<=30,1,2))
[1] 1 1 2 1 2 1 1 2 1 2 2 1 1 1 2 2 1 1 1 1
> (x3=ifelse(x<=30,"A","B"))
[1] "A" "A" "B" "A" "B" "A" "A" "B" "A" "B" "B" "A" "A" "A" "B" "B" "A" "A" "A" "A"


> y
[1] "B" "A" "C" "C" "B" "A" "D" "B" "C" "D"
> (y2=ifelse(y %in% c("A","C"),"Group1","Group2"))
[1] "Group2" "Group1" "Group1" "Group1" "Group2" "Group1" "Group2" "Group2" "Group1" "Group2"


> x
[1]  4 12 50 18 50 22 23 46  8 46 36 18 10 14 35 48 23 17 29 30
> (x2=ifelse(x<=10,1,ifelse(x<20,2,3)))
[1] 1 2 3 2 3 3 3 3 1 3 3 2 1 2 3 3 3 2 3 3


> y
[1] "B" "A" "C" "C" "B" "A" "D" "B" "C" "D"
> y2=ifelse(y%in%c("A","E"),1,ifelse(y=="C",2,3))
> y2
[1] 3 1 2 2 3 1 3 3 2 3


（二）使用cut 函数
cut函数可以根据我们设置的分割点(breaks)将数据重编码，将一个数值向量变量转换为分组形态的factors变量。

cut(x,breaks,labels,include.lowest=F,right=T)

• x为数值向量
• breaks为分割点信息。若breaks为向量，则根据向量中的数字进行分割。若breaks为大于1正整数k，则将
x分成均等的k组。
• labels为分割后各组的名称，若为null，则输出数字向量，否则输出factor变量。
• include.lowest=FALSE表示分割时不含各区间端点的最小值。
• right=T表示各区间为左端open，右端closed的区间

> x
[1]  4 12 50 18 50 22 23 46  8 46 36 18 10 14 35 48 23 17 29 30
> x2=cut(x,breaks = c(0,10,20,max(x)),labels = c(1,2,3))
> x2
[1] 1 2 3 2 3 3 3 3 1 3 3 2 1 2 3 3 3 2 3 3
Levels: 1 2 3
> as.vector(x2)
[1] "1" "2" "3" "2" "3" "3" "3" "3" "1" "3" "3" "2" "1" "2" "3" "3" "3" "2" "3" "3"


x3=cut(x,breaks = c(0,10,20,max(x)))
> x3
[1] (0,10]  (10,20] (20,50] (10,20] (20,50] (20,50] (20,50] (20,50] (0,10]  (20,50] (20,50] (10,20] (0,10]
[14] (10,20] (20,50] (20,50] (20,50] (10,20] (20,50] (20,50]
Levels: (0,10] (10,20] (20,50]


> score=round(rnorm(10,60,10))
> score
[1] 39 65 60 69 58 69 70 62 61 75
> score.cut=cut(score,breaks=5)
> score.cut
[1] (39,46.2]   (60.6,67.8] (53.4,60.6] (67.8,75]   (53.4,60.6] (67.8,75]   (67.8,75]   (60.6,67.8]
[9] (60.6,67.8] (67.8,75]
Levels: (39,46.2] (46.2,53.4] (53.4,60.6] (60.6,67.8] (67.8,75]


> score.cut=cut(score,breaks=5,labels = F)
> score.cut
[1] 1 4 3 5 3 5 5 4 4 5
> score.cut=as.factor(score.cut)
> score.cut
[1] 1 4 3 5 3 5 5 4 4 5
Levels: 1 3 4 5


（三）使用car程序包中的recode函数

car程序包的recode函数可以将数值或者字符向量、factor变量重新编码。

• x为数值向量，字符向量或者factor 变量。
• recode为设定重新编码规则的字符串。
• as.factor.result为是否输出factor变量。若是则为TRUE，不是为FALSE。
• levels为排序向量。指定新的编码分组的顺序（默认是按照分组名称排序）。

recodes参数编码规则的写法

recodes参数的值是一个字符串，字符串里面是以分号分隔的编码规则：
recodes=“规则1;规则2…”

（1）旧码=新码 旧码只有单一数值。例如：“0=NA”表示将0改为NA。
（2）旧码向量=新码 多个旧码改为一个新码。例如：“c(7,8,9)=‘high’”，将7，8，9改为high。
（3）start:end=新码 有序数字改码。例如：“lo:19=‘C’”。
（4）else=新码 所有其他情况。例如：“else=NA”。

> library(carData)
> library(car)
> x
[1] 1 2 3 1 2 3 1 2 3
> recode(x,"c(1,2)='A';else='B'")
[1] "A" "A" "B" "A" "A" "B" "A" "A" "B"


> score
[1] 75 70 66 65 55 69 75 69 82 83
> recode(score,"lo:40=1;41:60=2;61:80=3;81:hi=4;else=NA")
[1] 3 3 3 3 2 3 3 3 4 4


> recode(score,"lo:40='A';41:60='B';61:80='C';81:hi='D';else=NA")
[1] "C" "C" "C" "C" "B" "C" "C" "C" "D" "D"


10-30 4334
07-27 7294
08-21 528
07-02 1968
10-24 2万+
03-03 3万+
02-18 3178
05-28 361
12-08 524
05-23 2万+
10-23 7115
06-14
06-11 7万+
03-19 1307
08-12 2万+
01-13 474