机器学习笔记1-Supervised learning

1、Another algorithm for maximizing l(θ)
To get us started, lets consider Newton’s method for finding a zero of a function. Specifically, suppose we have some function f : R→ R, and we wish to find a value of θ so that f(θ) = 0. Here, θ ∈ R is a real number. Newton’s method performs the following update:

θ:=θf(θ)f(θ)

for l(θ) :
θ:=θl(θ)l(θ)

Lastly, in our logistic regression setting, θ is vector-valued, so we need to generalize Newton’s method to this setting. The generalization of Newton’s method to this multidimensional setting (also called the Newton-Raphson method) is given by :
θ:=θH1θl(θ)

Here, θl(θ) is, as usual, the vector of partial derivatives of l(θ) with respect to the θi’s; and H is an n-by-n matrix (actually, n + 1-by-n + 1, assuming that we include the intercept term) called the Hessian, whose entries are given by:
Hij=2l(θ)xy

Newton’s method typically enjoys faster convergence than (batch) gradient descent, and requires many fewer iterations to get very close to the minimum. One iteration of Newton’s can, however, be more expensive than one iteration of gradient descent, since it requires finding and inverting an n-by-n Hessian; but so long as n is not too large, it is usually much faster overall. When Newton’s method is applied to maximize the logistic regression log likelihood function l(θ) , the resulting method is also called Fisher scoring.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值