关于吴恩达深度学习总结(一)相关函数
文章目录
一、cost function(成本函数)
衡量在全体训练样本上的表现情况
(6)
J
=
1
m
∑
i
=
1
m
L
(
a
(
i
)
,
y
(
i
)
)
J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}
J=m1i=1∑mL(a(i),y(i))(6)
J = − 1 m ∑ i = 1 m y ( i ) log ( a ( i ) ) + ( 1 − y ( i ) ) log ( 1 − a ( i ) ) J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}) J=−m1i=1∑my(i)log(a(i))+(1−y(i))log(1−a(i))
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
二、loss function(损失函数)
衡量算法的运行情况,衡量在单个训练样本上的表现情况
(3)
L
(
a
(
i
)
,
y
(
i
)
)
=
−
y
(
i
)
log
(
a
(
i
)
)
−
(
1
−
y
(
i
)
)
log
(
1
−
a
(
i
)
)
\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}
L(a(i),y(i))=−y(i)log(a(i))−(1−y(i))log(1−a(i))(3)
def L(A,Y):
loss=Y * np.log(A) + (1 - Y) * np.log(1 - A)
return loss
KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_1(\hat{y},…
def L1(yhat, y):
loss = np.sum(np.abs(y - yhat))
return loss
KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ & L_2(\hat{y},…
def L2(yhat, y):
loss = np.dot((y - yhat),(y - yhat).T)
return loss
三、y hat
识别对象满足y=1的概率
(2)
y
^
(
i
)
=
a
(
i
)
=
s
i
g
m
o
i
d
(
z
(
i
)
)
\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}
y^(i)=a(i)=sigmoid(z(i))(2)
(1) z ( i ) = w T x ( i ) + b z^{(i)} = w^T x^{(i)} + b \tag{1} z(i)=wTx(i)+b(1)
A = sigmoid(np.dot(w.T, X) + b)
四、参数的更新规则
θ = θ − α d θ \theta = \theta - \alpha \text{ } d\theta θ=θ−α dθ
$$
$$
alpha,对应的是学习率
w = w - learning_rate * dw
b = b - learning_rate * db
五、w,b的导数
(7) ∂ J ∂ w = 1 m X ( A − Y ) T \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7} ∂w∂J=m1X(A−Y)T(7)
(8) ∂ J ∂ b = 1 m ∑ i = 1 m ( a ( i ) − y ( i ) ) \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8} ∂b∂J=m1i=1∑m(a(i)−y(i))(8)
dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)
六、向量化logistic回归
A = σ ( w T X + b ) = ( a ( 0 ) , a ( 1 ) , . . . , a ( m − 1 ) , a ( m ) ) A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)}) A=σ(wTX+b)=(a(0),a(1),...,a(m−1),a(m))
J = − 1 m ∑ i = 1 m y ( i ) log ( a ( i ) ) + ( 1 − y ( i ) ) log ( 1 − a ( i ) ) J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}) J=−m1i=1∑my(i)log(a(i))+(1−y(i))log(1−a(i))
A = sigmoid(np.dot(w.T, X) + b)
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
七、激活函数
1.sigmoid function(sigmoid函数)
s i g m o i d ( x ) = 1 1 + e − x sigmoid(x) = \frac{1}{1+e^{-x}} sigmoid(x)=1+e−x1
def sigmoid(x):
## x--任意大小的标量或numpy数组。
s = 1 / (1 + np.exp(-x))
return s
1.1sigmoid function-derivative(sigmoid导)
σ ′ ( x ) = s ( 1 − s ) \sigma'(x) = s(1-s) σ′(x)=s(1−s)
def sigmoid_derivative(x):
## ds--计算梯度
s = sigmoid(x)
ds = s * (1 - s)
return ds
2.tanh 函数
t a n h ( x ) = e x − e − x e x + e − x tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} tanh(x)=ex+e−xex−e−x
def tanh(x):
t = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
return t
3.ReLU函数(max(0,x))
def ReLU(x):
if x>0 :
return x
else :
return 0
4.leaky ReLU函数(max(0.01x,x))
def leakyReLU(x):
if x>0.01*x :
return x
else :
return 0.01*x
八、图像数组重塑
将读取的图像(3D数组)重新塑造成一维向量。
def image2vector(image):
v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
return v
九、规范化数据(一)
将每个行向量除以他的范数。(将每个元素的范围都压缩到(0,1)之间)
def normalizeRows(x):
x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
x = x / x_norm
return x
十、规范化数据(二)
当算法需要对两个或多个类进行分类时,可以将softmax看作一个规范化函数。
def softmax(x):
x_exp = np.exp(x)
x_sum = np.sum(x_exp, axis = 1, keepdims = True)
s = x_exp / x_sum
return s
十一、初始化w,b
def initialize(dim):
"""
这个函数为w创建一个形状为0 (dim, 1)的向量,并初始化b为0。
dim --我们想要的w向量的大小(在本例中是参数的数量)
w -- 初始化形状向量(dim, 1)
b --初始化标量(对应偏差)
"""
w = np.zeros((dim, 1))
b = 0
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
十二、学习参数
def propagate(w, b, X, Y):
"""
w -- 权值,一个大小为numpy的数组(num_px * num_px * 3,1)
b -- 偏差, 标量
X -- 数据的大小 (num_px * num_px * 3, number of examples)
Y -- 正确的 "label" 矢量 (containing 0 if non-cat, 1 if cat) 的大小 (1, number of examples)
cost -- 逻辑回归的负对数似然成本
dw -- 损失相对于w的梯度,因此形状与w相同
db -- 损失相对于b的梯度,因此形状与b相同
"""
m = X.shape[1]
A = sigmoid(np.dot(w.T, X) + b)
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
十三、优化(更新参数)
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
该函数通过运行梯度下降算法优化w和b
w --权值,一个大小为numpy的数组(num_px * num_px * 3,1)
b -- 偏差, 标量
X -- 形状数据 (num_px * num_px * 3, number of examples)
Y -- 准确地 "label" 矢量 (containing 0 if non-cat, 1 if cat), 的形状 (1, number of examples)
num_iterations -- 优化循环的迭代次数
learning_rate -- 梯度下降更新规则的学习率
print_cost -- 真打印损失每100步
params -- 包含权重w和偏差b的字典
grads -- 包含权重梯度和相对于成本函数的偏差的字典
costs --列出优化过程中计算的所有成本,这将用于绘制学习曲线。
"""
costs = []
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w = w - learning_rate * dw
b = b - learning_rate * db
if i % 100 == 0:
costs.append(cost)
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
十四、预测数据集的标签
def predict(w, b, X):
"""
使用学习逻辑回归参数(w, b)预测标签是0还是1
w -- 权值,一个大小为numpy的数组(num_px * num_px * 3,1)
b -- 偏差, 标量
X -- 数据的大小 (num_px * num_px * 3, number of examples)
Y_prediction -- 一个numpy数组(向量),包含X中示例的所有预测(0/1)
"""
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
if A[0, i] <= 0.5:
Y_prediction[0, i] = 0
else:
Y_prediction[0, i] = 1
assert(Y_prediction.shape == (1, m))
return Y_prediction
十五、构建模型函数
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
"""
X_train -- 由形状的numpy数组表示的训练集 (num_px * num_px * 3, m_train)
Y_train -- 由numpy数组表示的训练标签 (矢量) 的形状 (1, m_train)
X_test -- 由形状(num_px * num_px * 3, m_test)的numpy数组表示的测试集
Y_test -- 由形状(1,m_test)的numpy数组(向量)表示的测试标签
num_iterations -- 超参数表示优化参数的迭代次数
learning_rate -- 表示optimize()更新规则中使用的学习率的超参数
print_cost -- 设置为true,以每100次迭代打印成本
d -- 包含模型信息的字典。
"""
w, b = initialize(X_train.shape[0])
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = parameters["w"]
b = parameters["b"]
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
numpy相关函数的学习(www.numpy.org)