-
sigmoid function
math: 应用一个实数x=3, s= 1/(1+math.exp(x))
vector/matrixs: 应用一个向量x=np.array(1,2,3), s=1/(1+np.exp(-x))
-
Sigmoid gradient
1)通过计算梯度优化损失函数。
2)计算sigmod function 的梯度, sigmod的梯度如下:
3)步骤:
批量计算x的sigmod, 存储为s。
通过s更新梯度, ds = s*(1-s)
注意:一般梯度与y有关系, 所以一般求y,再利用y更新梯度
import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()
def sigmoid(x):
"""
Compute the sigmoid of x
Arguments:
x -- A scalar or numpy array of any size
Return:
s -- sigmoid(x)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1/(1+np.exp(-x))
### END CODE HERE ###
return s
def sigmoid_derivative(x):
"""
Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
Arguments:
x -- A scalar or numpy array
Return:
ds -- Your computed gradient.
"""
### START CODE HERE ### (≈ 2 lines of code)
s = sigmoid(x)
ds = s*(1-s)
### END CODE HERE ###
return ds
x = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))
- Reshaping arrays
深度学习中常用的两个numpy function, np.shape, np.reshape()
np.shape: 获取向量或矩阵的维度。
np.reshape: 将向量或矩阵重新组成其他纬度。
例如: 图像识别中,图像一般是3D, (length, height, depth=3), 然而一般算法输入时, 需要转位一纬向量(lengthheight3, 1). 即将3D数组reshape为1D vector.
def image2vector(image):
"""
Argument:
image -- a numpy array of shape (length, height, depth)
Returns:
v -- a vector of shape (length*height*depth, 1)
"""
### START CODE HERE ### (≈ 1 line of code)
v = image.reshape((image.shape[0]*image.shape[1]*image.shape[2], 1))
### END CODE HERE ###
return v
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
#print(image.shape)
print ("image2vector(image) = " + str(image2vector(image)))
-
Normalizing rows --标准化
1)标准化后效果变得更好, 同时收敛快。
2)标准化,即x/||x||, 此处x为矩阵。
3)如下:
np.linalg.norm(x, ord, axis=1,keepdims=True)
x: 向量, 或矩阵
ord: 规范化类型。默ord=None:默认情况下,是求整体的矩阵元素平方和,再开根号。 ord=1, l1范式, ord=2, l2范式, ord=3, 无穷范式
axis: 默认按行处理, axis=1 ,多个行向量的范式, axis=0 , 多个列向量范式, axis=None, 矩阵范式。
keepdims: 是否保持矩阵二维特性。 -
Broadcasting and the softmax function
broadcasting , 两个不同纬度的数据进行计算。
def softmax(x):
"""Calculates the softmax for each row of the input x.
Your code should work for a row vector and also for matrices of shape (n, m).
Argument:
x -- A numpy matrix of shape (n,m)
Returns:
s -- A numpy matrix equal to the softmax of x, of shape (n,m)
"""
### START CODE HERE ### (≈ 3 lines of code)
# Apply exp() element-wise to x. Use np.exp(...).
x_exp = np.exp(x)
# Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True). 这里计算x_exp按行求和,shape为(x.shape[0], 1)。
x_sum = np.sum(x_exp, axis = 1, keepdims = True)
# Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
s = x_exp/x_sum
### END CODE HERE ###
return s
x = np.array([
[9, 2, 5, 0, 0],
[7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))
- Vectorization
vectorization:向量化,对于复杂的计算逻辑,可提高计算性能。
这里主要说的是,传统的通过for循环计算两个数组的点积,以及w[i,j]=x[i]*y[i]与使用np的向量计算之间时间差异。
注意: np.dot与np.multiply()区别。 np.outer为 x[i]*y[I]。
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
-
应用L1, L2
loss = 0
for i in range(len(y)):
loss += np.abs(y[i] - yhat[i])
loss = np.dot(y, y) + np.dot(yhat, yhat) - 2 * np.dot(y, yhat)
注意L2 向量计算方法