# 回归分析中的评价方法

回归（Regression）不同于分类问题，在回归方法中我们预测一系列连续的值，在预测完后有个问题是如何评价预测的结果好坏，关于这个问题目前学术界也没有统一的标准。下面是我在论文中的看到的一些常用方法，希望对有缘人有用。

1 MAE（Mean Absolute Error）平均绝对差值

In statistics, the mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by

$\mathrm{MAE} = \frac{1}{n}\sum_{i=1}^n \left| f_i-y_i\right| =\frac{1}{n}\sum_{i=1}^n \left| e_i \right|.$

As the name suggests, the mean absolute error is an average of the absolute errors $e_i = |f_i - y_i|$, where $f_i$ is the prediction and $y_i$ the true value. Note that alternative formulations may include relative frequencies as weight factors.

2 MSE(Mean Square Error)均方误差

If $\hat{Y}$ is a vector of n predictions, and $Y$ is the vector of the true values, then the (estimated) MSE of the predictor is: $\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2.$

3 RMSE(Root Mean Square error)均方根误差

RMSE跟RMSD(Root-mean-square deviation)均方根偏差的定义等价，RMSE实际上就是MSE的平方根。

The RMSD of an estimator $\hat{\theta}$ with respect to an estimated parameter $\theta$ is defined as the square root of the mean square error:

$\operatorname{RMSD}(\hat{\theta}) = \sqrt{\operatorname{MSE}(\hat{\theta})} = \sqrt{\operatorname{E}((\hat{\theta}-\theta)^2)}.$

For an unbiased estimator, the RMSD is the square root of the variance, known as the standard error.

The RMSD of predicted values $\hat y_t$ for times t of a regression's dependent variable $y$ is computed for n different predictions as the square root of the mean of the squares of the deviations:

$\operatorname{RMSD}=\sqrt{\frac{\sum_{t=1}^n (\hat y_t - y_t)^2}{n}}.$

In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series $x_{1,t}$ and $x_{2,t}$, the formula becomes

$\operatorname{RMSD}= \sqrt{\frac{\sum_{t=1}^n (x_{1,t} - x_{2,t})^2}{n}}.$

4 Normalized root-mean-square deviation归一化均方差跟偏差

The normalized root-mean-square deviation or error (NRMSD or NRMSE) is the RMSD divided by the range of observed values of a variable being predicted,or:

$\mathrm{NRMSD} = \frac{\mathrm{RMSD}}{y_\max -y_\min}$

The value is often expressed as a percentage, where lower values indicate less residual variance.

coefficient of variation of the RMSD

The coefficient of variation of the RMSD, CV(RMSD), or more commonly CV(RMSE), is defined as the RMSD normalized to the mean of the observed values:

$\mathrm{CV(RMSD)} = \frac {\mathrm{RMSD}}{\bar y}.$

It is the same concept as the coefficient of variation except that RMSD replaces the standard deviation.

-------------------------------------------------------------------------------------------------------------------------------------------

Correlation Coefficient（相关系数）

相关表和相关图可反映两个变量之间的相互关系及其相关方向，但无法确切地表明两个变量之间相关的程度。于是，著名统计学家卡尔·皮尔逊设计了统计指标——相关系数(Correlation coefficient)。相关系数是用以反映变量之间相关关系密切程度的统计指标。相关系数是按积差方法计算，同样以两变量与各自平均值的离差为基础，通过两个离差相乘来反映两变量之间相关程度；着重研究线性的单相关系数。

依据相关现象之间的不同特征，其统计指标的名称有所不同。如将反映两变量间线性相关关系的统计指标称为相关系数（相关系数的平方称为判定系数）；将反映两变量间曲线相关关系的统计指标称为非线性相关系数、非线性判定系数；将反映多元线性相关关系的统计指标称为复相关系数、复判定系数等。

相关关系是一种非确定性的关系，相关系数是研究变量之间线性相关程度的量。由于研究对象的不同，相关系数有如下几种定义方式：

-------------------------------------------------------------------------------------------------------------------------------------

6  Pearson's Correlation Coefficient(皮尔逊相关系数)

有的论文里叫COR（相关性）

(1)、当相关系数为0时，X和Y两变量无关系。

(2)、当X的值增大（减小），Y值增大（减小），两个变量为正相关，相关系数在0.00与1.00之间。

(3)、当X的值增大（减小），Y值减小（增大），两个变量为负相关，相关系数在-1.00与0.00之间。

0.6-0.8     强相关
0.4-0.6     中等程度相关
0.2-0.4     弱相关
0.0-0.2     极弱相关或无相关

(1)、两个变量之间是线性关系，都是连续数据。

(2)、两个变量的总体是正态分布，或接近正态的单峰分布。

(3)、两个变量的观测值是成对的，每对观测值之间相互独立。

7 concordance correlation coefficient(一致性相关系数)

In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability.

Definition:

Lawrence Lin has the form of the concordance correlation coefficient $\rho_c$ as

$\rho_c = \frac{2\rho\sigma_x\sigma_y}{\sigma_x^2 + \sigma_y^2 + (\mu_x - \mu_y)^2},$

where $\mu_x$ and $\mu_y$ are the means for the two variables and $\sigma^2_x$ and $\sigma^2_y$ are the corresponding variances$\rho$ is the correlation coefficient between the two variables.

This follows from its definition[1] as

$\rho_c = 1 - \frac{{\rm Expected\ orthogonal\ squared\ distance\ from\ the\ diagonal\ }x=y}{{\rm Expected\ orthogonal\ squared\ distance\ from\ the\ diagonal\ }x=y{\rm \ assuming\ independence}}.$

When the concordance correlation coefficient is computed on a N-length data set (i.e., two vectors of length N) the form is

$\hat{\rho}_c = \frac{2 s_{xy}}{s_x^2 + s_y^2 + (\bar{x} - \bar{y})^2},$

where the mean is computed as

$\bar{x} = \frac{1}{N} \sum_{n=1}^N x_n$

and the variance

$s_x^2 = \frac{1}{N} \sum_{n=1}^N (x_n - \bar{x})^2$

and the covariance

$s_{xy} = \frac{1}{N} \sum_{n=1}^N (x_n - \bar{x})(y_n - \bar{y}) .$

Whereas the ordinary correlation coefficient (Pearson's) is immune to whether the biased or unbiased versions for estimation of the variance is used, the concordance correlation coefficient is not. In the original article Lin suggested the 1/N normalization, while in another article Nickerson appears to have used the 1/(N-1), i.e., the concordance correlation coefficient may be computed slightly differently between implementations.

Relation to other measures of correlation

The concordance correlation coefficient is nearly identical to some of the measures called intra-class correlations, and comparisons of the concordance correlation coefficient with an "ordinary" intraclass correlation on different data sets found only small differences between the two correlations, in one case on the third decimal. It has also been stated that the ideas for concordance correlation coefficient "are quite similar to results already published by Krippendorff in 1970".

In the original article[1] Lin suggested a form for multiple classes (not just 2). Over ten years later a correction to this form was issued.

One example of the use of the concordance correlation coefficient is in a comparison of analysis method for functional magnetic resonance imaging brain scans.

Reference:

http://en.wikipedia.org/wiki/Concordance_correlation_coefficient

http://en.wikipedia.org/wiki/Correlation_coefficient

http://en.wikipedia.org/wiki/Root_mean_square_error

http://blog.csdn.net/wsywl/article/details/5727327

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120