如何使用计算机来线性拟合,如何在给定协方差矩阵和拟合系数的情况下计算线性回归的p值(How do I calculate p values of a linear regression given t...

最新推荐文章于 2024-04-07 20:10:37 发布

weixin_39558221

最新推荐文章于 2024-04-07 20:10:37 发布

阅读量718

点赞数

文章标签：如何使用计算机来线性拟合

如何在给定协方差矩阵和拟合系数的情况下计算线性回归的p值(How do I calculate p values of a linear regression given the covariance matrix and fit coefficients)

我使用GSL库在C中执行了线性回归。我在R中执行了相同的回归。我可以使用“summary”命令在R中访问此回归的p值。

在C中，我有协方差矩阵，残差平方和值和拟合系数。使用这些，我如何计算p值？

任何人都可以确认它的有效性吗？与R相比，它给出的结果有所不同。

我已经将这行C代码隔离为不正确，但我不明白为什么：

double pv0=t0<0?2*(1-gsl_cdf_tdist_P(-t0,n-2)):2*(1-gsl_cdf_tdist_P(t0,n-2));//This is the p-value of the constant term

R给出的结果：

系数：

Estimate Std. Error t value Pr(>|t|)

(Intercept) -700.000 226.569 -3.090 0.05373 .

x 60.000 6.831 8.783 0.00311 **

C给出的结果：

Coefficients Estimate Std. Error t value Pr(>|t|)

(Intercept) -700.000000 226.568606 -3.089572 -550099700.000000

x 60.000000 6.831301 8.783101 -4.000000

I have performed a linear regression in C using the GSL library. I've performed the same regression in R. I can access the p values for this regression in R using the "summary" command.

In C, I have the covariance matrix, sum of squared residuals value and fit coefficients. Using these, how do I calculate the p values?

Can anyone confirm it's validity? The results it gives differ for me compared to R.

I have isolated this line of C code as incorrect, but I can't see why:

double pv0=t0<0?2*(1-gsl_cdf_tdist_P(-t0,n-2)):2*(1-gsl_cdf_tdist_P(t0,n-2));//This is the p-value of the constant term

Results given by R:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -700.000 226.569 -3.090 0.05373 .

x 60.000 6.831 8.783 0.00311 **

Results given by C:

Coefficients Estimate Std. Error t value Pr(>|t|)

(Intercept) -700.000000 226.568606 -3.089572 -550099700.000000

x 60.000000 6.831301 8.783101 -4.000000

原文：https://stackoverflow.com/questions/26289968

更新时间：2019-10-15 02:51

最满意答案

gsl_cdf_tdist_P应该有两个double参数，并在0和1之间返回一个double 。

首先检查所有类型。

如果在所有类型都正确的情况下它返回0-1之外的值，则某处存在非常严重的问题。

gsl_cdf_tdist_P should expect two double arguments and return a double between 0 and 1.

Check all your types first.

If it's returning values outside 0-1 when all the types are correct, there's a very serious problem somewhere.

相关问答

这在实施上有所不同。 R中的lm使用基于QR分解的基础C代码。模型矩阵被分解为正交矩阵Q和三角矩阵R.这引起了其他人称之为“共线性检查”的情况。 R没有检查，QR分解的性质确保最小共线变量在拟合算法中变得“优先”。有关线性回归背景下QR分解的更多信息： https ： //www.stat.wisc.edu/~larget/math496/qr.html sklearn的代码基本上是numpy.linalg.lstsq的一个包装，它使欧几里德二次规范最小化。如果你的模型是Y = AX ，它将

...

如果您不知道错误之间的协方差，您可以采用迭代方法。您将首先使用普通最小二乘法，计算错误以及错误之间的协方差。然后，您将使用计算的协方差矩阵应用GLS并重新估计协方差矩阵。您将继续使用带有新协方差矩阵的GLS进行迭代，直到您收敛为止。这里有一个链接 (.pdf警告)给这个方法的例子，以及加权和迭代加权最小平方的相关讨论，其中你在GLS中假设的误差之间没有相关性。 If you do not know the covariance between the errors you can take

...

glht(ct1, linfct = 'cyl + hp = 0')将不起作用，因为ct1不是glht对象，不能通过as.glht强制转换为glht对象。我不知道是否有一个包或现有的功能来做这个，但这不是一个困难的工作自己。以下小功能： LinearCombTest

## if `.vcov` missing, use the one returned by `lm`

if (is.null(.v

...

我可以看到你正在做中心回归： sandipan的答案并不是你想要的，因为它通过通常的常规方程来估算：后者已经存在一个问题：求解正规方程会给出与使用lm不同的系数？在这里，我专注于前者。其实你已经很接近了。您已获得混合协方差C ： # y x1 x2

#y 10.4 -2.0 -0.6

#x1 -2.0 10.5 3.0

#x2 -0.6 3.0 4.4

根据您对E和F定义，您知道需要子矩阵才能继续。实际上，您可以执行矩阵子集而不是手动输入： E

...

首先，从统计上来说，这可能不是分析时态数据的最佳方法。虽然，关于你提出的方法，构建一个循环来获得这个非常简单： Coefs

for(i in 1:ncol(Data)){

Coefs[i,]

}

First of all, statisti

...

看起来你只有一个X和一个Y，所以输出是正确的。尝试这个： #coef_ : array, shape (n_features, ) or (n_targets, n_features)

print(regr.coef_)

#intercept_ : array Independent term in the linear model.

print(regr.intercept_)

http://scikit-learn.org/stable/modules/genera

...

你这么说 I added a column of 1's in x for constant term when using both methods

但是LinearRegression的文档说明了这一点 LinearRegression(fit_intercept=True, [...])

它默认适合拦截。这可以解释为什么你在常数项中有差异。现在对于其他系数，当两个变量高度相关时，可能会出现差异。让我们考虑两个列完全相同的极端情况。然后，通过增加另一个，可以补偿减小两者中任何一个前

...

首先， summary(full_lm)$coefficients[,4]返回p-values而不是系数。现在，为了真正回答你的问题，我相信你的一些变量会退出估计，因为它们与其他一些变量是完全一致的。如果运行summary(full_lm) ，则会看到对这些变量的估计在所有字段中都返回NA 。因此，它们不包含在summary(full_lm)$coefficients 。举个例子： x

eps

...

gsl_cdf_tdist_P应该有两个double参数，并在0和1之间返回一个double 。首先检查所有类型。如果在所有类型都正确的情况下它返回0-1之外的值，则某处存在非常严重的问题。 gsl_cdf_tdist_P should expect two double arguments and return a double between 0 and 1. Check all your types first. If it's returning values outside 0-1

...

Dirk的答案会更快，但如果方便的话，这里是纯R的实现(从summary.lm提取你需要的位，并假设非全级模型矩阵没有问题等) 例： set.seed(101)

p值计算： rss

rdf

resvar

...