[ML of Andrew Ng]Week 3 Logistic Regression and Regularization

最新推荐文章于 2022-02-03 15:49:21 发布

大庆csdn

最新推荐文章于 2022-02-03 15:49:21 发布

阅读量383

点赞数

分类专栏： meachine learning

本文链接：https://blog.csdn.net/mrliudq/article/details/50887103

版权

meachine learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Week 3 Logistic Regression and Regularization

Week 3 Logistic Regression and Regularization
- Logistic Regression
- Regularization

Logistic Regression

Classification

$y \in \{{0,1}\}$
0: “Negative Class” 1:”Positive Class”

Linear Regression: $h_\theta (x)$ can be >1 or <0
Logistic Regression(Classfication): $0 \le h_\theta (x) \le 1$

Hypothesis Representation

Now we want $0 \le h_\theta (x) \le 1$ ,in linear regression, $h_\theta (x) = \boldsymbol{\theta}^T x$ ,
we use sigmoid function (i.e. Logistic function),then

h θ (x) = g (θ T x)

$h_\theta (x) = g(\theta ^T x)$

g (z) = 1 1 + e - z

$g(z) = \frac{1}{1+e^{-z}}$

h θ (x) = 1 1 + e - θ T x

$h_\theta (x) = \frac{1}{1+e^{-\theta^T x}}$

Interpretation of Hypothesis Output
$h_\theta (x)$ = estimated probability that y = 1 on input x

DEFINITION
$h_\theta (x)$ : probability that y = 1, given x, parameterized by $\theta$

Decision boundary

if $h_{\theta}(x) \ge 0.5$ then $y = 1$
else $h_{\theta}(x) < 0.5$ then $y = 0$

DEFINITION
The decision boundary is the line that separates the region where the hypothesis predicts Y=0 ,one from the region where the hypothesis predicts that y=0.

Cost function

$Cost(h_\theta (x),y) = \begin{cases} -\log(h_\theta (x)) \quad if \; y=1\ -\log(1 - h_\theta (x)) \quad if \; y=0 \end{cases}$
Captures intuition tha if $h_\theta (x) = 0$ ,but $y = 1$ ,we’ll penalize learning algorithm by a very large cost.

Simplified cost function and gradient descent

C o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x))

$Cost(h_\theta (x),y) = -y \log(h_\theta (x))-(1-y)\log(1 - h_\theta (x))$

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i))

$J(\theta) = \frac{1}{m} \sum_{i=1}^m Cost(h_\theta (x^{(i)}),y^{(i)})$

J (θ) = - 1 m [\sum i = 1 m y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = -\frac{1}{m}[ \sum_{i=1}^m y^{(i)} \log(h_\theta (x^{(i)}))+(1-y^{(i)})\log(1 - h_\theta (x^{(i)}))]$

Want $\min_\theta J(\theta)$ :
Repeat: $\theta_j = \theta_j - \alpha \sum_{i=1}^m (h_\theta (x^{(i)})-y^{(i)})x_j^{(i)}$
Algorithm looks identical to linear regression!But they have different $h_\theta(x)$ .

Advanced optimization

Given $\theta$ ,we have code that can compute

- $J(\theta)$
- $、\frac{\partial}{\partial\theta_j}J(\theta)$

Optmization algorithms	Advantages
Gradient descent	No need to manually pick $\alpha$
Conjugate gradient	Often faster tahn gradient descent
BFGS	Disadvantages
L-BFGS	More complex

In matlab:

function [jVal,gradent] = costFunction(theta,X,y)
jVal = ...
gradient = ...

options = optimset('GradObj','on', 'MaxIter', '100')
initalTheta = zeros(m,1);
[optTheta,functionVal,exitFlag]...
        =fminunc(@costFunction,initialTheta,options)

Multi-class classification:One-vs-all

Train：Train a logistic regression classifier $h_\theta^{(i)}(x)$ for each class $i$ to predict the probability that $y = i$ .
On a new input $x$ , to make a prediction, pick the class $i$ that maximizes

max i h (i) θ (x)

$\max_i h_\theta^{(i)}(x)$

Regularization

The problem of overfitting

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( $J(\theta) \approx 0$ ), but fail to generalize to new examples (predict prices on new examples).

example

Options:

Reduce number of features.
- Manually select which features to keep.
- Model selection algorithm (later in course).
Regularization.
- Keep all the features, but reduce magnitude/values of parameters $\theta_j$ .
- Works well when we have a lot of features, each of which contributes a bit to predicting $y$ .

Cost function

Small values for parameters $\theta_0,\theta_1,\cdots,\theta_n$
-“Simpler” hypothesis
-Less prone to overfitting

J (θ) = 1 2 m [\sum i = 1 m (h θ (x (i)) - y (i)) 2 + λ \sum j = 1 n θ 2 j]

$J(\theta) = \frac{1}{2m}[\sum_{i=1}^m (h_\theta (x^{(i)}) - y^{(i)})^2 + \lambda \sum_{j=1}^n \theta_j^2 ]$

Regularized linear regression

Gradient descent

θ j = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i))) - y (i)) x (i) j

$\theta_j = \theta_j(1-\alpha \frac{\lambda}{m}) - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)})) - y^{(i)})x_j^{(i)}$

Normal equation

If $\lambda > 0$ ,

θ = (X T X + λ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 01 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥) - 1 X T y

$\theta = (X^TX + \lambda \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & \ddots &\\ & & & 1\\ \end{bmatrix} )^{-1} X^Ty$

Regularized logistic regression

Cost function

J (θ) = - 1 m [\sum i = 1 m - y (i) log (h θ (x (i))) - (1 - y (i)) log (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = -\frac{1}{m}[ \sum_{i=1}^m -y^{(i)} \log(h_\theta (x^{(i)}))-(1-y^{(i)})\log(1 - h_\theta (x^{(i)}))] \\ +\frac{\lambda}{2m} \sum_{j=1}^n\theta_j^2$

Gradient descent

θ 0 = θ 0 - α \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j = θ j - α [\sum i = 1 m (h θ (x (i)) - y (i)) x (i) j - λ m θ j (j = 1, 2, 3 \dots, n)

$\theta_0 = \theta_0 - \alpha \sum_{i=1}^m (h_\theta (x^{(i)})-y^{(i)})x_0^{(i)}\\ \theta_j = \theta_j - \alpha [\sum_{i=1}^m (h_\theta (x^{(i)})-y^{(i)})x_j^{(i)}-\frac{\lambda}{m}\theta_j \quad (j=1,2,3 \cdots ,n)$

Advanced optimization

Change the costFunction function.

function [J, grad] = costFunctionReg(theta, X, y, lambda)

m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

J = 1/m * (-y'*log(sigmoid(X*theta)) - (1-y)'*log(1-sigmoid(X*theta))) ...
    + lambda/(2*m) * sum((theta(2:end)).^2);

grad = 1/m * X' * (sigmoid(X*theta)-y) + lambda/m*theta;
grad(1) = grad(1) - lambda/m*theta(1);

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
    fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
end

大庆csdn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[ML of Andrew Ng]Week 3 Logistic Regression and Regularization

Week 3 Logistic Regression and Re…Week 3 Logistic Regression and ReLogistic RegressionClassificationHypothesis RepresentationDecision boundaryCost functionSimplified cost function and gradient de
复制链接

扫一扫