Residuals 残差
用最小二乘法提出两个等式
- 残差和存在相互抵消的问题
- 残差与预测变量相关(此为父母身高)
也许回归模型的关键洞见是适合他们生产高度可翻译的模型。这是与机器学习算法,它常常牺牲可解释性改善预测性能或自动化。当然,这些都是有价值的属性在他们自己的权利。然而,简单的好处,吝啬和intrepretability回归模型(和他们的亲密归纳)应该使他们第一次选择的工具对于任何实际的问题。
回归模型的关键在于它能够产生高度可解释的模型。这与机器学习算法不同,后者常常为了提高预测准确度或自动化而牺牲可解释性。当然,这些做法都有其各自的优势。然而,回归模型所提供的简单,简约,可解释性,都让它应该成为在解决任何实际问题时的首选。
因此要用 最小二乘法( Ordinary Least Square,OLS):所选择的回归模型应该使所有观察值的残差平方和达到最小。(Q为残差平方和)- 即采用平方损失函数。
R:
cov函数
Description描述
var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed.
cov2cor scales a covariance matrix into the corresponding correlation matrix efficiently. 计算协方差,两个向量的相似度
Usage用法
var(x, y = NULL, na.rm = FALSE, use)
cov(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))
cor(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))
cov2cor(V)
Arguments参数
x
a numeric vector, matrix or data frame.
y
NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).
na.rm
logical. Should missing values be removed?
use
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
method
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.
V
symmetric numeric matrix, usually positive definite such as a covariance matrix.
Examples
var(1:10) # 9.166667
var(1:5, 1:5) # 2.5
## Two simple vectors
cor(1:10, 2:11) # == 1
## Correlation Matrix of Multivariate sample:
(Cl <- cor(longley))
## Graphical Correlation Matrix:
symnum(Cl) # highly correlated
## Spearman's rho and Kendall's tau
symnum(clS <- cor(longley, method = "spearman"))
symnum(clK <- cor(longley, method = "kendall"))
## How much do they differ?
i <- lower.tri(Cl)
cor(cbind(P = Cl[i], S = clS[i], K = clK[i]))
# For compatibility with 2.2.21 .get_course_path <- function(){ tryCatch(swirl:::swirl_courses_dir(), error = function(c) {file.path(find.package("swirl"),"Courses")} ) } galton <- read.csv(file.path(.get_course_path(), "Regression_Models","Introduction", "galton.csv")) est <- function(slope, intercept)intercept + slope*galton$parent sqe <- function(slope, intercept)sum( (est(slope, intercept)-galton$child)^2) attenu <- datasets::attenu fname <- paste(.get_course_path(), "Regression_Models","Residuals","res_eqn.R",sep="/") #Here are the vectors of variations or tweaks sltweak <- c(.01, .02, .03, -.01, -.02, -.03) #one for the slope ictweak <- c(.1, .2, .3, -.1, -.2, -.3) #one for the intercept lhs <- numeric() rhs <- numeric() #left side of eqn is the sum of squares of residuals of the tweaked regression line for (n in 1:6) lhs[n] <- sqe(ols.slope+sltweak[n],ols.ic+ictweak[n]) #right side of eqn is the sum of squares of original residuals + sum of squares of two tweaks for (n in 1:6) rhs[n] <- sqe(ols.slope,ols.ic) + sum(est(sltweak[n],ictweak[n])^2)