Notes on Machine Learning By Andrew Ng (3)

最新推荐文章于 2021-10-01 17:31:54 发布

我是全宇宙ENERGE的总量

最新推荐文章于 2021-10-01 17:31:54 发布

阅读量191

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_43038346/article/details/95976942

版权

机器学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Note on Machine Learning By Andrew Ng (3)

Click here for previous note.

Logistic Regression

—A classification algorithm.

Classificaton

binary classification: $\in \{0 ,1\}$

0: “Negative Class”

1: “Positive Class”

If we want to use a line to fit or predict something (like the tumor is malignant ), we could set threshold classifier output $h_\theta(x)$ at 0.5.

If $h_\theta(x) \geq 0.5$ , predict $y = 1$ ,

if $h_\theta(x) < 0.5$ , predict $y = 0$ .

You may think linear regression does work in classification, but if the data changes a little bit, it would give us a really bad prediction.

Also, the classification requires $y = 0$ or 1, but $h_\theta(x)$ by linear regression can be $> 1$ or $< 0$ .

So, by using Logistic Regression, we will generate an $h_\theta(x)$ that around $[0, 1]$ .

Hypothesis Representation

Fix $h_\theta(x)$ that fit outr need, which is map $\mathbb{R}$ to $[0, 1]$ .
$h_\theta(x) = g(\theta^Tx) \\ g(z) = \frac{1}{1+e^{-z}} \\ \rightarrow h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$
We call that Sigmoid function or Logistic function.

$[外链图片转存失败(img-4tDGedKz-1563177226529)(C:\Users\chenh\Desktop\Notebook\Machine Learing\2019年7月7日1.png)]$

$h_\theta(x) = P(y=1|x;\theta)$

Probability that y=1, given x, parameterized by $\theta$ .

$P(y=1|x;\theta)+ P(y=0|x;\theta) = 1$

Decision Boundary

The property of the hypothesis.

Separate two different classes by a line odr curve.

If $h_\theta(x) \geq 0.5$ , predict $y = 1$ ,

if $h_\theta(x) < 0.5$ , predict $y = 0$ .

When $\geq 0$ , $\geq 0.5$ . So, $h_\theta(x) = g(\theta^Tx) \geq 0.5$ , when $\theta^Tx \geq 0$ .

Non-linear decision boundaries

USE POLYNOMIAL TERMS

Cost Function

How do we choose parameters $\theta$ ?

For linear regression, $J(\theta) = \frac{1}{m}\sum_{i=1}^m\frac{1}{2}{(h_\theta(x^{(i)}) - y^{(i)})}^2$ , and we modify it to be like $J(\theta) = \frac{1}{m}\sum_{i=1}^m cost(h_\theta (x),y)$ ,and the $cost(h_\theta (x),y) = \frac{1}{2}{(h_\theta(x^{(i)}) - y^{(i)})}^2$ .

But it is a non-convex function, so we have to find a new cost function, which is convex and has a global minimum.

Logistic regression cost function

在这里插入图片描述

$C:\Users\chenh\Desktop\Notebook\Machine Learing\untitled.png$
$\quad if \quad y=1, h_\theta(x) =1$

But as
$h_\theta(x) \rightarrow 0\\ cost \rightarrow \infin$

Capture intuition that if $h_\theta(x) = 0$ , (predict $P(y=1|x;\theta) =0$ ), but $y = 1$ , we’ll penalize learning algorithm by a very large cost.

Simplified cost function and gradient descent

$Cost(h_\theta(x), y) = -y*log(h_\theta(x))-(1-y)log(1-h_\theta(x))$
$J(\theta) = \frac{1}{m}\sum_{i = 1}^mCost(h_\theta(x), y)\\ =-\frac{1}{m}[\sum_{i=1}^m y^{(i)}*log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]$
To find parameters $\theta$ :