Neural Network and Deep Learning--week 1

Pluto_XH

于 2019-06-13 18:40:29 发布

阅读量77

点赞数

分类专栏： deep learning 文章标签： deep learning

本文链接：https://blog.csdn.net/weixin_38648770/article/details/91878627

版权

deep learning 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

Binary Classification
Logistic Regression
Vectorization

Binary Classification

Yes or No

$x\in\mathbb R^{n_x},y\in\{0,1\}$
m training examples: ${(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)})\}$

In Matrix:
$X\in\mathbb R^{n_x\times m},Y\in\mathbb R^{1\times m}$
$X=\begin{bmatrix} x_1^{(1)} & x_1^{(2)} &...& x_1^{(m)}\\ ...&...&...&...\\ x_{n_x}^{(1)} & x_{n_x}^{(2)} &...& x_{n_x}^{(m)} \end{bmatrix}$
$Y=\begin{bmatrix} y^{(1)} & y^{(2)} &...& y^{(m)} \end{bmatrix}$

Logistic Regression

a learning algorithm that you use when the output labels y in a supervised learning problem are all either 0 or 1, so for binary classification problems,

Given $x$ , want $\hat{y}=P(y=1|x)$ (the probability of the chance that y is equal to 1 given the input features x)

In Linear Regression
Parameters: $w\in \mathbb R^{n_x}, b\in \mathbb R$
Output: $\hat{y}=w^Tx+b$

But in Logistic Regression, $0\leq \hat{y}\leq 1$
Output: $\hat{y}=\sigma(w^Tx+b)$

Sigmoid function

$\sigma(x)=\frac{1}{1+e^{-x}}$
在这里插入图片描述

Loss function (Error function)

To measure how good the out put $\hat{y}$ is when the true label is $y$

In Linear Regression: $\mathcal L(\hat{y},y)=\frac{1}{2}(\hat{y}-y)^2$
But in Logistic Regression: $\mathcal L(\hat{y},y)=-(y\log \hat{y}+(1-y)\log(1-\hat{y}))$

If $y = 1$ , then $\mathcal L(\hat{y},y)=-y\log \hat{y}$ , when $\hat{y}\rightarrow 1, \mathcal L(\hat{y},y)\rightarrow 0$
If $y = 0$ , then $\mathcal L(\hat{y},y)=-\log(1-\hat{y})$ , when $\hat{y}\rightarrow 0, \mathcal L(\hat{y},y)\rightarrow 0$

在这里插入图片描述

Cost function

The loss function measures how well you’re doing on a single training example.
The cost function measures how well you’re doing on an entire training set.
$J(w,b)=\frac{1}{m}\sum\limits_{i=1}^m\mathcal L(\hat{y}^{(i)},y^{(i)})=-\frac{1}{m}\sum\limits_{i=1}^m[y^{(i)}\log \hat{y}^{(i)}+(1-y^{(i)})\log(1-\hat{y}^{(i)})]$

Gradient Descent

Computation Graph

在这里插入图片描述
Chain rule (Backward Calculation)
$\frac{dJ}{dV}=3$
$\frac{dV}{da}=1$
$\frac{dJ}{da}= \frac{dJ}{dV}\frac{dV}{da}=3$
$\frac{dJ}{du}= \frac{dJ}{da} =3$
$\frac{dJ}{db}= \frac{dJ}{du} \frac{du}{db} =3c$

Variable Name in Code Writing:
$\frac{dFinalOutputVar}{dvar}$
$\frac{dJ}{dV}$
$\frac{dV}{da}$

Computation Graph of Logistic Regression
在这里插入图片描述

Forward/left to right calculation to compute the cost function
Backward/right to left calculation to compute derivatives

Gradient Descent on $m$ Examples

Initialize:

$J=0, dw_1=0, ..., dw_n=0, db=0$

Training set:

For i =1 to m
$z^{(i)} = w^Tx^{(i)}+b$
$a^{(i)} = \sigma(z^{(i)})$
$J+=-[y^{(i)}\log a^{(i)} + (1-y^{(i)})\log (1-a^{(i)})]$
$dz^{(i)} = a^{(i)} -y^{(i)}$
$db += dz^{(i)}$

for j = 1 to n
$dw_j += x_j^{(i)}dz^{(i)}$

$\frac{1}{m}J$
$\frac{1}{m}db$

for j = 1 to n
$dw_j = \frac{1}{m} dw_j$

$J,dw_1,...,dw_n,db$ are entire training set

Gradient Descent:

for j = 1 to n
$w_j = w_j-\alpha dw_j$
$\alpha db$

Vectorization

import numpy as np
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b)
toc = time.time()

print("Vectorized Version:"+ str(1000*(toc-tic)) + 'ms')

c = 0
tic = time.time()
for i in range(1000000):
	c += a[i]*b[i]
toc = time.time()

print("For Loop:"+ str(1000*(toc-tic)) + 'ms')

For Loop is 300 times slower than Vectorized Version

Vectorization techniques allow you to get rid of these explicit for-loops in your code
Training set:

$Z=w^TX+b=np.dot(w.T,X)+b$
$A=\sigma(Z)$
$dZ=A\dot Y$
$dw=\frac{1}{m}XdZ^T$
$\frac{1}{m}np.sum(dZ)$

Pluto_XH

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Neural Network and Deep Learning--week 1

文章目录Binary ClassificationLogistic RegressionSigmoid functionLoss function (Error function)Cost functionGradient DescentComputation GraphGradient Descent on $m$ ExamplesVectorizationBinary Classificat...
复制链接

扫一扫