2020-1区top-JASA-顶刊-A New Coefficient of Correlation-源代码

官方源代码

# The following function computes the new correlation coefficient, 
# and returns the P-value for testing independence unless otherwise specified. 
# Removes NAs and converts factor variables to integers automatically, 
# unless otherwise specified. 
# In general, it is safe to just supply x and y, 
# and leave the other parameters to their default values.

xi = function(x, y, pvalue = T, ties = T, method = "asymptotic", nperm = 1000, factor = T) {
	
	# x and y are the data vectors
	# The P-value for the test of independence is returned if pvalue = T. Otherwise, only the coefficient is returned. 
	# If ties = T, the algorithm assumes that the data has ties and employs the more elaborated theory for calculating the P-value. Otherwise, it uses the simpler theory. There is no harm in putting ties = T even if there are no ties.
	# method = "asymptotic" returns P-values computed by the asymptotic theory. If method = "permutation", a permutation test with nperm permutations is employed to estimate the P-value. Usually, there is no need for the permutation test. The asymptotic theory is good enough.
	# nperm is the number of permutations for the permutation test, if needed.
	# factor = T results in the algorithm checking whether x and y are factor variables and converting them to integers if they are. If it is known that the variables are numeric, a little bit of time can be saved by setting factor = F.
	
	
	# NAs are removed here:
	ok = complete.cases(x,y)
	x = x[ok]
	y = y[ok]	
	
	# Factor variables are converted to integers here:
	if (factor == T) {
		if (!is.numeric(x)) x = as.numeric(factor(x))
		if (!is.numeric(y)) y = as.numeric(factor(y))
	}
	
	
	n = length(x) 							# n is the sample size.
	
	PI = rank(x, ties.method = "random")		# PI is the rank vector for x, with ties broken at random 
	
	
	f = rank(y, ties.method = "max")/n		# f[i] is number of j s.t. y[j] <= y[i], divided by n.
	
	g = rank(-y, ties.method = "max")/n		# g[i] is number of j s.t. y[j] >= y[i], divided by n.
	
	ord = order(PI)							# order of the x's, ties broken at random.
	
	f = f[ord]								# Rearrange f according to ord.
	
	# xi is calculated in the next three lines:
	A1 = mean(abs(f[1:(n-1)] - f[2:n]))*(n-1)/(2*n)
	C = mean(g*(1-g))
	xi = 1 - A1/C
	
	
	# If P-value needs to be computed:
	if (pvalue == T) {
		
		# If there are no ties, return xi and theoretical P-value:
		if (ties == F) return(list(xi = xi, pval = 1 - pnorm(sqrt(n)*xi/sqrt(2/5))))
		
		# If there are ties, and the theoretical method is to be used for calculation P-values:
		if (method == "asymptotic") {
			
			# The following steps calculate the theoretical variance in the presence of ties:
			q = sort(f)
			ind = c(1:n)
			ind2 = 2*n - 2*ind + 1
			a = mean(ind2*q*q)/n
			c = mean(ind2*q)/n
			cq = cumsum(q)
			m = (cq + (n - ind)*q)/n
			b = mean(m^2)
			v = (a - 2*b + c^2)/(C^2)
			
			# Return xi and P-value:
			return(list(xi = xi, pval = 1 - pnorm(sqrt(n)*xi/sqrt(v))))
		}
		
		# If permutation test is to be used for calculating P-value:
		if (method == "permutation") {
			r = rep(0, nperm)
			for (i in 1:nperm) {
				x1 = runif(n, 0, 1)
				r[i] = xi(x1,y)$xi
			}
		
			# Return xi and P-value based on permutation test:
			return(list(xi = xi, pval = mean(r > xi)))
		}
		cat("Invalid method. Use either asymptotic or permutation.")
	}
	
	# If only xi is desired, return xi:
	else return(xi)
}

论文原文

在这里插入图片描述

Reference

  1. 2020-1区top-JASA-顶刊-A New Coefficient of Correlation
  2. github.com/czbiohub/xicor
  3. souravchatterjee.su.domains//xi.R
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

千行百行

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值