prcomp 报错

prcomp 报错:

Error in prcomp.default(pca_data, center = T, scale = T) :
cannot rescale a constant/zero column to unit variance

How to solve prcomp.default(): cannot rescale a constant/zero column to unit variance

I have a data set of 9 samples (rows) with 51608 variables (columns) and I keep getting the error whenever I try to scale it:

This works fine

pca = prcomp(pca_data)
However,

pca = prcomp(pca_data, scale = T)
gives

Error in prcomp.default(pca_data, center = T, scale = T) :
cannot rescale a constant/zero column to unit variance
Obviously it’s a little hard to post a reproducible example. Any ideas what the deal could be?

Looking for constant columns:

sapply(1:ncol(pca_data), function(x){
              length = unique(pca_data[, x]) %>% length
            }) %>% table

Output:

.
       2     3     4     5     6     7     8     9 
    3892  4189  2124  1783  1622  2078  5179 30741 

So no constant columns. Same with NA’s -

is.na(pca_data) %>% sum

   >[1] 0

This works fine:

pca_data = scale(pca_data)

But then afterwards both still give the exact same error:

pca = prcomp(pca_data)
   pca = prcomp(pca_data, center = F, scale = F)

So why cant I manage to get a scaled pca on this data? Ok, lets make 100% sure that it’s not constant.

pca_data = pca_data + rnorm(nrow(pca_data) * ncol(pca_data))

Same errors. Numierc data?

sapply( 1:nrow(pca_data), function(row){
     sapply(1:ncol(pca_data), function(column){
        !is.numeric(pca_data[row, column])
      })
    } ) %>% sum

Still the same errors. I’m out of ideas.

Edit: more and a hack at least to solve it.

Later, still having a hard time clustering this data eg:

Error in hclust(d, method = "ward.D") : 
     NaN dissimilarity value in intermediate results. 

Trimming values under a certain cuttoff eg < 1 to zero had no effect. What finally worked was trimming all columns that had more than x zeros in the column. Worked for # zeros <= 6, but 7+ gave errors. No idea if this means that this is a problem in general or if this just happened to catch a problematic column. Still would be happy to hear if anyone has any ideas why because this should work just fine as long as no variable is all zeros (or constant in another way).

I don’t think you’re looking for zero variance columns correctly. Let’s try with some dummy data. First, an acceptable matrix: of 10x100:

mat <- matrix(rnorm(1000, 0), nrow = 10)
And one with a zero-variance column. Let’s call it oopsmat.

const <- rep(0.1,100)
oopsmat <- cbind(const, mat)
The first few elements of oopsmat look like this:

  const                                                                                               
[1,]   0.1  0.75048899  0.5997527 -0.151815650  0.01002536  0.6736613 -0.225324647 -0.64374844 -0.7879052
[2,]   0.1  0.09143491 -0.8732389 -1.844355560  0.23682805  0.4353462 -0.148243210  0.61859245  0.5691021
[3,]   0.1 -0.80649512  1.3929716 -1.438738923 -0.09881381  0.2504555 -0.857300053 -0.98528008  0.9816383
[4,]   0.1  0.49174471 -0.8110623 -0.941413109 -0.70916436  1.3332522  0.003040624  0.29067871 -0.3752594
[5,]   0.1  1.20068447 -0.9811222  0.928731706 -1.97469637 -1.1374734  0.661594937  2.96029102  0.6040814

Let’s try scaled and unscaled PCAs on oopsmat:

PCs <- prcomp(oopsmat) #works
PCs <- prcomp(oopsmat, scale. = T) #not forgetting the dot
#Error in prcomp.default(oopsmat, scale. = T) :
#cannot rescale a constant/zero column to unit variance
Because you can’t divide by the standard deviation if it’s infinity. To identify the zero-variance column, we can use which as follows to get the variable name.

which(apply(oopsmat, 2, var)==0)
#const
#1
And to remove zero variance columns from the dataset, you can use the same apply expression, setting variance not equal to zero.

oopsmat[ , apply(oopsmat, 2, var) != 0]
Hope that helps make things clearer!

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值