吴恩达·Machine Learning || chap7 Regularizationn 简记

最新推荐文章于 2024-08-14 15:29:01 发布

The Prestige

最新推荐文章于 2024-08-14 15:29:01 发布

阅读量101

点赞数

分类专栏： Machine Learning 文章标签：机器学习

本文链接：https://blog.csdn.net/qq_46203130/article/details/119617666

版权

Machine Learning 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

7 Regularization

7-1 The problem of overfitting

underfitting——high bias

Just right

overfitting——high variance 高方差

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples

Addressing overfitting：

Options:

Reduce number of features
Regularization

Keep all the features, but reduce magnitude/values of parameters 0
Works well when we have a lot of features, each of which contributes a bit to predicting y

7-2 Cost function

Intuition

if $\theta _ { 0 } + \theta _ { 1 } x + \theta _ { 2 } x ^ { 2 }$ fitting，and $\theta _ { 0 } + \theta _ { 1 } x + \theta _ { 2 } x ^ { 2 } + \theta_3 x ^ { 3 } + \theta_4 x ^ { 4 }$ overfitting

$\longrightarrow$ Suppose we penalize and make $\theta_3,\theta_4$ really small,close to 0

Regularization

Small values for parameters $\theta_0,\theta_1,\cdots,\theta_n$

Simpler"hypothesis
Less prone to overfitting

Housing:

Features: $x_1,x_2,\cdots,x_{100}$
Parameters: $\theta_0,\theta_1,\theta_2,\cdots,\theta_{100}$

$\theta ) = \frac { 1 } { 2 m } [ \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } + \lambda \sum _ { j = 1 } ^ { n } \theta _ { j } ^ { 2 } ]$

( $\lambda$  regularization parameter)

In regularized linear regression, we choose $\theta$ to minimize $J(\theta)$

What if $\lambda$ is set to an extremely large value (perhaps for too large for our problem, say $\lambda=10^{10}$ )?

7-3 Regularization linear regression

Regularization linear regression

$\theta ) = \frac { 1 } { 2 m } [ \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } + \lambda \sum _ { j = 1 } ^ { n } \theta _ { j } ^ { 2 } ]$

$\theta _ { j } : = \theta _ { j } ( 1 - \alpha \frac { 1 } { m } ) - \alpha \frac { 1 } { m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { j } ^ { ( i ) }$

Normal equation

$\left[ \begin{array} { l } { ( x ^ { ( 1 ) } ) ^ { T } }\\\vdots \\ { ( x ^ { ( m ) } ) ^ { T } } \end{array} \right]$ $\left[ \begin{array} { l } { y ^ { ( 1 ) } } \\ { \vdots } \\ { y ^ { ( m ) } } \end{array} \right]$

Non-invertibility(optional/advanced)

Suppose m $\le$ n (m: examples,n: features)

$\theta = ( X ^ { T } X ) ^ { - 1 } X ^ { T } y$

if $\lambda>0$ ,

$\theta = ( X ^ { T } X + \lambda(\begin{bmatrix} 0 \\ &1 \\ &&1 \\ &&&\ddots \\&&&&1\end{bmatrix}) ^ { - 1 } X ^ { T } y$

7-4 Regularized logistic regression

Gradient descent

Repeat{

$\theta _ { 0 } : = \theta _ { 0 } - \alpha \frac { 1 } { m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { 0 } ^ { ( i ) }$

$\theta _ { j } : = \theta _ { j } ( 1 - \alpha \frac { 1 } { m } ) - \alpha \frac { 1 } { m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) x _ { j } ^ { ( i ) }$

}

Advanced optimization

function [jVal,gradient] = costFunction(theta)
	jVal = [code to compute J(θ)];
	gradient(1)= [code to compute ∂J(θ)/∂(θ_0) ] 
	gradient(2)= [code to compute ∂J(θ)/∂(θ_1) ]
	gradient(3)= [code to compute ∂J(θ)/∂(θ_2) ]
	...
	gradient(n+1)= [code to compute ∂J(θ)/∂(θ_n) ]

The Prestige

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达·Machine Learning || chap7 Regularizationn 简记

7 Regularization7-1 The problem of overfittingunderfitting——high biasJust rightoverfitting——high variance 高方差Overfitting: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examplesA
复制链接

扫一扫

专栏目录