神经网络思想建立LR模型（DL公开课第二周答案）

最新推荐文章于 2022-07-21 21:21:09 发布

LeadAI学院

最新推荐文章于 2022-07-21 21:21:09 发布

阅读量838

点赞数 1

上海站 | 高性能计算之GPU CUDA培训

4月13-15日

三天密集式学习快速带你晋级

阅读全文

正文共6603个字，3张图，预计阅读时间17分钟。

LR回顾

LR计算图求导

算法结构

设计一个简单的算法实现判别是否是猫。

用一个神经网络的思想建立一个LR模型，下面这个图解释了为什么LR事实上是一个简单的神经网。

[图片上传失败...(image-4b2c8b-1515499689320)]

Mathematical expression of the algorithm:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$
$$ \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

构建算法的各个部分

建立神经网络的主要步骤是:

定义模型结构(例如输入特性的数量)
初始化模型的参数
循环:

计算当前损失(正向传播)

计算当前梯度(向后传播)

更新参数(梯度下降)

您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。

工具函数

# GRADED FUNCTION: sigmoiddef sigmoid(z):

"""

Compute the sigmoid of z

Arguments: z -- A scalar or numpy array of any size.

Return:

s -- sigmoid(z)

"""

s = 1/(1+np.exp(-z))

return s

初始化参数

# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):

"""

This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.

Argument:

dim -- size of the w vector we want (or number of parameters in this case)

Returns:

w -- initialized vector of shape (dim, 1)

b -- initialized scalar (corresponds to the bias)

"""

w = np.zeros((dim,1))

b = 0

assert(w.shape == (dim, 1))

assert(isinstance(b, float) or isinstance(b, int))

return w, b

向前和向后传播

现在参数已经初始化，可以执行向前和向后传播步骤来学习参数。

Exercise: 实现方法 propagate()计算代价函数和梯度
Hints:

Forward Propagation:

You get X
You compute $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}{m}y{(i)}\log(a{(i)})+(1-y{(i)})\log(1-a^{(i)})$

Here are the two formulas you will be using:

$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a{(i)}-y{(i)})\tag{8}$$

# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):

"""

Implement the cost function and its gradient for the propagation explained above

Arguments:

w -- weights, a numpy array of size (num_px * num_px * 3, 1)

b -- bias, a scalar

X -- data of size (num_px * num_px * 3, number of examples)

Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

Return:

cost -- negative log-likelihood cost for logistic regression

dw -- gradient of the loss with respect to w, thus same shape as w

db -- gradient of the loss with respect to b, thus same shape as b

Tips:

- Write your code step by step for the propagation. np.log(), np.dot()

"""

m = X.shape[1]

# FORWARD PROPAGATION (FROM X TO COST)

A = sigmoid(np.dot(w.T,X)+b) # compute activation

cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m # compute cost

# BACKWARD PROPAGATION (TO FIND GRAD)

dw = np.dot(X,(A-Y).T)/m

db = np.sum((A-Y))/m

assert(dw.shape == w.shape)

assert(db.dtype == float)

cost = np.squeeze(cost)

assert(cost.shape == ())

grads = {"dw": dw, "db": db}

return grads, cost

优化

已经初始化了参数。
也可以计算一个成本函数和它的梯度。
现在，需要使用梯度下降来更新参数。

目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$\theta$，更新规则是 $ \theta = \theta - \alpha \text{ } d\theta$，$\alpha$是学习率。

# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

"""

This function optimizes w and b by running a gradient descent algorithm

Arguments:

w -- weights, a numpy array of size (num_px * num_px * 3, 1)

b -- bias, a scalar

X -- data of shape (num_px * num_px * 3, number of examples)

Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop

learning_rate -- learning rate of the gradient descent update rule

print_cost -- True to print the loss every 100 steps

Returns:

params -- dictionary containing the weights w and bias b

grads -- dictionary containing the gradients of the weights and bias with respect to the cost function

costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.

Tips:

You basically need to write down two steps and iterate through them:

1) Calculate the cost and the gradient for the current parameters. Use propagate().

2) Update the parameters using gradient descent rule for w and b.

"""

costs = []

for i in range(num_iterations):

# Cost and gradient calculation (≈ 1-4 lines of code)

grads, cost = propagate(w, b, X, Y)

# Retrieve derivatives from grads

dw = grads["dw"]

db = grads["db"]

# update rule (≈ 2 lines of code)

w = w-learning_rate*dw

b = b-learning_rate*db

# Record the costs

if i % 100 == 0:

costs.append(cost)

# Print the cost every 100 training examples

if print_cost and i % 100 == 0:

print ("Cost after iteration %i: %f" %(i, cost))

params = {"w": w,

"b": b}

grads = {"dw": dw,

"db": db}

return params, grads, costs

预测

前面的函数将输出学习的w和b，我们可以使用w和b来预测数据集x的标签，实现预测()函数。计算预测有两个步骤:

1、Calculate $\hat{Y} = A = \sigma(w^T X + b)$

2、Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

# GRADED FUNCTION: predictdef predict(w, b, X):

'''

Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)

Arguments:

w -- weights, a numpy array of size (num_px * num_px * 3, 1)

b -- bias, a scalar

X -- data of size (num_px * num_px * 3, number of examples)

Returns:

Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X

'''

m = X.shape[1]

Y_prediction = np.zeros((1,m))

w = w.reshape(X.shape[0], 1)

# Compute vector "A" predicting the probabilities of a cat being present in the picture

A = sigmoid(np.dot(w.T,X)+b)

for i in range(A.shape[1]):

# Convert probabilities A[0,i] to actual predictions p[0,i]

if A[0][i]>0.5:

Y_prediction[0][i]=1

assert(Y_prediction.shape == (1, m))

return Y_prediction

合并各个部分组成模型

现在，将通过将所有构建块(在前面部分中实现的函数)组合在一起，以正确的顺序将整个模型构建起来。

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):

"""

Builds the logistic regression model by calling the function you've implemented previously Arguments:

X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)

X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)

Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)

num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()

print_cost -- Set to true to print the cost every 100 iterations

Returns:

d -- dictionary containing information about the model.

"""

# initialize parameters with zeros (≈ 1 line of code)

w, b = initialize_with_zeros(X_train.shape[0])

# Gradient descent (≈ 1 line of code)

parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)

# Retrieve parameters w and b from dictionary "parameters"

w = parameters["w"]

b = parameters["b"]

# Predict test/train set examples (≈ 2 lines of code)

Y_prediction_test = predict(w, b, X_test)

Y_prediction_train = predict(w, b, X_train)

# Print train/test Errors

print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

d = {"costs": costs,

"Y_prediction_test": Y_prediction_test,

"Y_prediction_train" : Y_prediction_train,

"w" : w,

"b" : b,

"learning_rate" : learning_rate,

"num_iterations": num_iterations}

return d

原文链接：https://www.jianshu.com/p/8e269451795d
作者个人博客：http://mcgrady.cn
github：https://github.com/TracyMcgrady6