机器学习笔记1-Supervised learning

最新推荐文章于 2022-05-15 12:36:58 发布

xdgs_2005

最新推荐文章于 2022-05-15 12:36:58 发布

阅读量348

点赞数

分类专栏：人工智能

本文链接：https://blog.csdn.net/xdgs_2005/article/details/52426533

版权

人工智能专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1、Another algorithm for maximizing $l({\theta})$
To get us started, lets consider Newton’s method for ﬁnding a zero of a function. Speciﬁcally, suppose we have some function f : R→ R, and we wish to ﬁnd a value of θ so that f(θ) = 0. Here, θ ∈ R is a real number. Newton’s method performs the following update:

θ : = θ - f ' ( θ ) f ( θ )

${\theta}:={\theta}-{\frac{f^{\prime}({\theta})}{f({\theta})}}$
for

l(θ) $l({\theta})$ :

θ : = θ - l ' ' ( θ ) l ' ( θ )

${\theta}:={\theta}-{\frac{l^{{\prime}{\prime}}({\theta})}{l^{\prime}({\theta})}}$
Lastly, in our logistic regression setting, θ is vector-valued, so we need to generalize Newton’s method to this setting. The generalization of Newton’s method to this multidimensional setting (also called the Newton-Raphson method) is given by :

θ : = θ - H - 1 \nabla θ l (θ)

${\theta}:={\theta}-H^{-1}{\nabla}_{\theta}l({\theta})$
Here,

∇θl(θ) ${\nabla}_{\theta}l({\theta})$ is, as usual, the vector of partial derivatives of

l(θ) $l({\theta})$ with respect to the θi’s; and H is an n-by-n matrix (actually, n + 1-by-n + 1, assuming that we include the intercept term) called the Hessian, whose entries are given by:

H i j = \partial 2 l ( θ ) \partial x \partial y

$H_{ij} =\frac{\partial^2 l({\theta})}{{\partial x}{\partial y}}$
Newton’s method typically enjoys faster convergence than (batch) gradient descent, and requires many fewer iterations to get very close to the minimum. One iteration of Newton’s can, however, be more expensive than one iteration of gradient descent, since it requires ﬁnding and inverting an n-by-n Hessian; but so long as n is not too large, it is usually much faster overall. When Newton’s method is applied to maximize the logistic regression log likelihood function

l(θ) $l({\theta})$ , the resulting method is also called Fisher scoring.