概述
R中逻辑回归用过很多次了,最近突然想对其源代码探究一二,以便更好理解该算法。此文章记录了R中逻辑回归的源代码的研究理解,如果有任何问题或错误欢迎各位读者提出。
建议:此篇文章主要以介绍代码实现为重点,会穿插理论知识。建议读者可以先大致了解逻辑回归理论再读此文。
R中实现逻辑回归可以通过调用glm函数实现,R中对该函数的使用方法及描述如下:
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = list(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)
glm
is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.
可见glm函数实际是用于广义线性模型的拟合,通过指定参数可以实现逻辑回归(实际逻辑回归属于广义线性回归的一种),简单介绍一下广义线性回归:
其中g(y)称为链接函数link function。其中逻辑回归的link function为
样例
以下是一个实现逻辑回归的代码样例,family = binomial(link = "logit) 即为指定link function的语句。先不考虑其他参数,通过这个简单的例子来研究其实现原理。
fit <- glm(label ~., family = binomial(link="logit"), data= train.yx,
control = list(maxit = 5000, epsilon = 0.00000001))
family |
character: the family name. |
link |
character: the link name. |
linkfun |
function: the link. |
linkinv |
function: the inverse of the link function. |
variance |
function: the variance as a function of the mean. |
dev.resids |
function giving the deviance residuals as a function of |
aic |
function giving the AIC value if appropriate (but |
mu.eta |
function: derivative |
initialize |
expression. This needs to set up whatever data objects are needed for the family as well as |
validmu |
logical function. Returns |
valideta |
logical function. Returns |
simulate |
(optional) function |
debug该行代码进入glm函数,函数中赋值、传参等操作代码不讨论,主要讨论与实现相关的核心功能代码。下面这句代码意为调用“method”这个函数,后面为函数参数。R中这样介绍method参数:the method to be used in fitting the model. The default method "glm.fit"
uses iteratively reweighted least squares (IWLS)。由于我们没有指定method,那么此时method即为glm.fit。注意这句话后半句说该method用iteratively reweighted least squares (IWLS)方法,也叫IRLS。这里是R中glm实现逻辑回归与一般对逻辑回归介绍不同的地方。一般书籍或文章介绍的逻辑回归求解时的cost function是对数似然函数,而glm.fit则是用IWLS方法,译为“迭代加权最小二乘