python计算spearman,Python Scipy:scipy.stats.spearmanr返回nans

本文探讨了在使用Spearman's ρ进行变量间相关性分析时遇到的错误,特别指出当样本数据包含大量全零行时会导致计算返回NaN。通过实例和代码展示了如何识别并处理这个问题,确保数据质量在统计分析中的有效性。
摘要由CSDN通过智能技术生成

Edit: Basically solved I think.

I am using spearmanr from scipy.stats to find the correlations between variables across a number of different samples. I have around 2500 variables and 36 samples (or 'observations')

If I calculate the correlations using all 36 samples, spearmanr works fine. If I use only the first 18 samples it also works fine. However if I use the latter 18 samples I get an error and nans are returned.

This is the error:

/Home/s1215235/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1945: RuntimeWarning: invalid value encountered in true_divide

return c / sqrt(multiply.outer(d, d))

/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in greater

cond1 = (scale > 0) & (x > self.a) & (x < self.b)

/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in less

cond1 = (scale > 0) & (x > self.a) & (x < self.b)

/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1719: RuntimeWarning: invalid value encountered in less_equal

cond2 = cond0 & (x <= self.a)

This is the code:

populationdata = np.vstack(thing).astype(np.float)

rho, pval = stats.spearmanr(populationdata[:,sampleindexes], axis = 1)

(populationdata is a numpy array full of floats; [:,sampleindexes] allows only a few of the columns to be used.

And this is what rho is returned as:

[[ 1. nan nan ..., 1. -0.05882353

-0.08574929]

[ nan nan nan ..., nan nan

nan]

[ nan nan nan ..., nan nan

nan]

...,

[ 1. nan nan ..., 1. -0.05882353

-0.08574929]

[-0.05882353 nan nan ..., -0.05882353 1. 0.68599434]

[-0.08574929 nan nan ..., -0.08574929 0.68599434 1. ]]

解决方案

In a comment it was noted that "There are a lot of 0s though." So populationdata[:,sampleindexes] probably has rows that are all 0. That will cause spearmanr to generate nan. For example,

In [3]: spearmanr([[0, 0, 0], [1, 2, 3]], axis=1)

/Users/warren/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py:1957: RuntimeWarning: invalid value encountered in true_divide

return c / sqrt(multiply.outer(d, d))

/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in greater

cond1 = (scale > 0) & (x > self.a) & (x < self.b)

/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in less

cond1 = (scale > 0) & (x > self.a) & (x < self.b)

/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1729: RuntimeWarning: invalid value encountered in less_equal

cond2 = cond0 & (x <= self.a)

Out[3]: (nan, nan)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值