Machine Learning Week Three

最新推荐文章于 2024-02-25 14:11:51 发布

YB0316

最新推荐文章于 2024-02-25 14:11:51 发布

阅读量448

点赞数 1

本文链接：https://blog.csdn.net/YB0316/article/details/53613216

版权

Classification

Logistic Function(also called sigmoid function) is in the form

g (z) = 1 1 + e - z

$g(z) = \frac{1}{1 +e^{-z}}$
which map real number to (0,1) interval.
new form of hypotheses

hθ(x) $h_\theta(x)$ is

h θ (x) = g (θ T x)

$h_\theta(x) = g(\theta^Tx)$
and

z = θ T x

$z = \theta^Tx$

now h(x) gives probability that our output is 1.
$h_\theta(x) = P(y = 1|x;\theta) =1 - P(y = 0| x;\theta)$

hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0

for example,
$z = \theta_0 + \theta_1x_1^2 + \theta_2x_2^2$
where

θ ⃗ = ⎡ ⎣ ⎢ θ 0 θ 1 θ 2 ⎤ ⎦ ⎥

$\vec\theta = \begin{bmatrix}\theta_0\\ \theta_1 \\ \theta_2\end{bmatrix}$

recap that

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i))

$J(\theta) = \frac{1}{m}\sum_{i = 1}^mCost(h_\theta(x^{(i)}),y^{(i)})$
now in classification problems
we ues cost function

Cost(hθ(x),y)=−log(hθ(x)) $Cost(h_\theta(x),y) = - log(h_\theta(x))$ if y = 1

Cost(hθ(x),y)=−log(1−hθ(x)) $Cost(h_\theta(x),y) = -log(1 - h_\theta(x))$ if y = 0

we can compress cost function into one case

C o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x))

$Cost (h_\theta(x),y) = -y \log(h_\theta(x)) - (1 - y)\log(1 - h_\theta(x))$

our fully cost function is as follows:

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = -\frac{1}{m}\sum_{i=1}^m[y^{(i)}\log(h_\theta(x^{(i)}))+(1 - y^{(i)})\log(1-h_\theta(x^{(i)}))]$
a vector implementation is

h = g (X T θ)

$h = g(X^T\theta)$

J (θ) = 1 m (- y T log (h) - （ 1 - y ） T log (1 - h))

$J(\theta)=\frac{1}{m}(-y^T\log(h) -（1-y）^T\log(1-h))$

the gradient descending algorithm is:

θ j : = θ j - α m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta_j:=\theta_j-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$
which is coincidently identical to the form of linear regression
and the vectorized implementation is:

θ : = θ - α m X T g ((X θ) - y)

$\theta:=\theta-\frac{\alpha}{m}X^Tg((X\theta) - y)$

Advanced Optimization

use matlab library to implement advanced algorithm

function [jVal, Gradient] = costFunction(theta)
    jVal = % write code here to compute J(theta)
    gradient = % here to compute derivative J(theta)
end

then we use fminunc()

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2, 1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction,
    initialTheta, options);

Multiclass Classification: One-vs-all

we divide our problem into n+1 binary classification problems; in each one, we predict the probability that ‘y’ is a member of one of our classes.

$y\in\{1,2,\dots,n\}$
$h_\theta^0(x)=P(y=0\vert x;\theta)$
$\dots$
$h_\theta^n(x) = P(y = n\vert x;\theta)$
$prediction = \max_i(h_\theta^{(i)}(x))$

over fitting

Underfitting is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that uses too few features.

Overfitting is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

Regularization

min θ J (x) = 1 2 m [\sum i = 1 m (h (i) θ (x) - y (i)) 2 + λ \sum j = 1 n θ 2 j]

$\min_\theta J(x)=\frac{1}{2m}[\sum_{i=1}^{m}(h_\theta^{(i)}(x)-y^{(i)})^2+\lambda\sum_{j=1}^{n}\theta_j^2]$

min θ J (x) = - 1 m [\sum i = 1 m y (i) log (h θ (x (i)) + (1 - y (i)) log (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$\min_\theta J(x)=-\frac{1}{m}[\sum_{i=1}^{m}y^{(i)}\log(h_\theta(x^{(i)})+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2$

gradient descent

θ j : = θ j - α [1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j + λ m θ j]

$\theta_j:=\theta_j-\alpha[\frac{1}{m}\sum_{i = 1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j+\frac{\lambda}{m}\theta_j]$

YB0316

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning Week Three

Classification Logistic Function(also called sigmoid function) is in the formg(z)=11+e−z g(z) = \frac{1}{1 +e^{-z}} which map real number to (0,1) interval. new form of hypotheses hθ(x)h_\theta(x)
复制链接

扫一扫