R -- factor因子

68 篇文章 2 订阅
本文介绍了R语言中如何将字符型变量转换为因子,并通过`levels`和`labels`参数自定义分组。展示了如何通过`factor`函数创建因子,并通过`as.numeric`转换因子时可能遇到的陷阱,即数值不代表原始值,而是等级序号。同时,演示了如何用`as.numeric`转换因子并保持标签信息不变。
摘要由CSDN通过智能技术生成

factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)

  • x
    a vector of data, usually taking a small number of distinct values.

  • levels
    an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).

  • labels
    either an optional character vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.

  • exclude
    a vector of values to be excluded when forming the set of levels. This may be factor with the same level set as x or should be a character.

rt3 <- readRDS("/mnt/d/TCGA/TCGA_clinical_ALL.rds")
print(rt3[1:4,1:3])
           sample     _PATIENT cancer type abbreviation
1 TCGA-OR-A5J1-01 TCGA-OR-A5J1                      ACC
2 TCGA-OR-A5J2-01 TCGA-OR-A5J2                      ACC
3 TCGA-OR-A5J3-01 TCGA-OR-A5J3                      ACC
4 TCGA-OR-A5J5-01 TCGA-OR-A5J5                      ACC
unique(rt3$'cancer type abbreviation')
  1. 'ACC'
  2. 'BLCA'
  3. 'BRCA'
  4. 'CESC'
  5. 'CHOL'
  6. 'COAD'
  7. 'DLBC'
  8. 'ESCA'
  9. 'GBM'
  10. 'HNSC'
  11. 'KICH'
  12. 'KIRC'
  13. 'KIRP'
  14. 'LAML'
  15. 'LGG'
  16. 'LIHC'
  17. 'LUAD'
  18. 'LUSC'
  19. 'MESO'
  20. 'OV'
  21. 'PAAD'
  22. 'PCPG'
  23. 'PRAD'
  24. 'READ'
  25. 'SARC'
  26. 'SKCM'
  27. 'STAD'
  28. 'TGCT'
  29. 'THCA'
  30. 'THYM'
  31. 'UCEC'
  32. 'UCS'
  33. 'UVM'
本来是字符型变量的
typeof(rt3$'cancer type abbreviation')

‘character’

作为因子后成为数值变量了(每个数值代表一个等级)
rt3$'cancer type abbreviation' <- factor(rt3$'cancer type abbreviation')
typeof(rt3$'cancer type abbreviation')

‘integer’

unique(rt3$'cancer type abbreviation')
  1. ACC
  2. BLCA
  3. BRCA
  4. CESC
  5. CHOL
  6. COAD
  7. DLBC
  8. ESCA
  9. GBM
  10. HNSC
  11. KICH
  12. KIRC
  13. KIRP
  14. LAML
  15. LGG
  16. LIHC
  17. LUAD
  18. LUSC
  19. MESO
  20. OV
  21. PAAD
  22. PCPG
  23. PRAD
  24. READ
  25. SARC
  26. SKCM
  27. STAD
  28. TGCT
  29. THCA
  30. THYM
  31. UCEC
  32. UCS
  33. UVM
Levels:
  1. 'ACC'
  2. 'BLCA'
  3. 'BRCA'
  4. 'CESC'
  5. 'CHOL'
  6. 'COAD'
  7. 'DLBC'
  8. 'ESCA'
  9. 'GBM'
  10. 'HNSC'
  11. 'KICH'
  12. 'KIRC'
  13. 'KIRP'
  14. 'LAML'
  15. 'LGG'
  16. 'LIHC'
  17. 'LUAD'
  18. 'LUSC'
  19. 'MESO'
  20. 'OV'
  21. 'PAAD'
  22. 'PCPG'
  23. 'PRAD'
  24. 'READ'
  25. 'SARC'
  26. 'SKCM'
  27. 'STAD'
  28. 'TGCT'
  29. 'THCA'
  30. 'THYM'
  31. 'UCEC'
  32. 'UCS'
  33. 'UVM'
因子将其分为了很多组,现在通过levels和labels修改分组
rt3$'cancer type abbreviation' <- factor(rt3$'cancer type abbreviation',
                                        levels=c('ACC','BLCA','BRCA','CESC','CHOL','COAD','DLBC','ESCA','GBM','HNSC','KICH','KIRC','KIRP','LAML','LGG','LIHC','LUAD','LUSC','MESO','OV','PAAD','PCPG','PRAD','READ','SARC','SKCM','STAD','TGCT','THCA','THYM','UCEC','UCS','UVM'),
                                        labels=c('ACC','BLCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA','BRCA'))
typeof(rt3$'cancer type abbreviation')

‘integer’

unique(rt3$'cancer type abbreviation')
  1. ACC
  2. BLCA
  3. BRCA
Levels:
  1. 'ACC'
  2. 'BLCA'
  3. 'BRCA'

factor 转化为as.numeric存在的陷阱

a <- as.numeric(rt3$'cancer type abbreviation')
unique(a)
  1. 1
  2. 2
  3. 3
b <- factor(c(2:8),levels=c("2","3","4","5","6","7","8"),labels=c("1","2","3","4","5","6","7"))
b
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
Levels:
  1. '1'
  2. '2'
  3. '3'
  4. '4'
  5. '5'
  6. '6'
  7. '7'
as.numeric(b)
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值