1 The achinevement of logistics regression
1.1 mathmatics formulas
picture 1.1 recognition processing
s
i
g
m
o
i
d
(
w
T
x
+
b
)
=
1
1
+
e
−
(
w
T
+
b
)
sigmoid(w^Tx+b)=\frac{1}{1+e^{-(w^T+b)}}
sigmoid(wTx+b)=1+e−(wT+b)1
L ( a ( i ) , y ( i ) ) = − y ( i ) l o g ( a ( i ) ) − ( 1 − y ( i ) l o g ( 1 − a ( i ) ) ) L(a^{(i)},y^{(i)})=-y^{(i)}log(a^{(i)})-(1-y^{(i)}log(1-a^{(i)})) L(a(i),y(i))=−y(i)log(a(i))−(1−y(i)log(1−a(i)))
J = 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) J=\frac{1}{m}\sum_{i=1}^{m}L(a^{(i)},y^{(i)}) J=m1i=1∑mL(a(i),y(i))
1.2 Build step
data pre-processing
define model structure(To receive data and processing data)
initialize arguments
updating Iteration processing:
-
compute lossfunction L(forward propagation)
-
compute gradient grad(backward propagation)
-
update arguments (W ,b)
1.3 Algorithm Code
1.data pre-processing
the shape of train_set_x_orig data(picture) is (m_train,num_px,num_px,3 )
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
m_train= the number of pictures
num_px * num_px * 3 = dimensions
then flatten the data: dimension(a,b,c,d) to (b * c * d,a)
X_flatten = X.reshape(X.shape[0],-1)
##### -1 means automatically supplement
finally standardize the data: divide 255
train_set_x = train_set_x_flatten/255.
2.initialize arguments
We need W matrix and b(single value) to realize the sigmoid function
first, get a sigmoid function:
def sigmoid(x):
s = 1/(1+np.exp(-x))
return x
the initialize the arguements which is w and b
def initialize_with_zeros(dim): #dim means the w column, must macth X(num_p*num_p*3)
w = np.zeros((dim, 1))
b = 0
return w, b
3.forward and backward propagation
forward propagation means building Loss function and get dw and db
backward propagation means to use dw and db to computer new arguments
build Loss function:
s
i
g
m
o
i
d
(
w
T
x
+
b
)
=
1
1
+
e
−
(
w
T
+
b
)
sigmoid(w^Tx+b)=\frac{1}{1+e^{-(w^T+b)}}
sigmoid(wTx+b)=1+e−(wT+b)1
L ( a ( i ) , y ( i ) ) = − y ( i ) l o g ( a ( i ) ) − ( 1 − y ( i ) l o g ( 1 − a ( i ) ) ) L(a^{(i)},y^{(i)})=-y^{(i)}log(a^{(i)})-(1-y^{(i)}log(1-a^{(i)})) L(a(i),y(i))=−y(i)log(a(i))−(1−y(i)log(1−a(i)))
J = 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) J=\frac{1}{m}\sum_{i=1}^{m}L(a^{(i)},y^{(i)}) J=m1i=1∑mL(a(i),y(i))
dw and db is:
∂
J
∂
w
=
1
m
X
(
A
−
Y
)
T
\frac{\partial J}{\partial w}=\frac{1}{m}X(A-Y)^T
∂w∂J=m1X(A−Y)T
∂ J ∂ b = 1 m ∑ i = 1 m ( a ( i ) − y ( i ) ) \frac{\partial J}{\partial b}=\frac{1}{m}\sum _{i=1}^m(a^{(i)}-y^{(i)}) ∂b∂J=m1i=1∑m(a(i)−y(i))
def propagate(w, b ,X ,Y):
m = X.shape[1] # num of the train data
A= sigmoid(np.dot(w.T,X)+b) # sigmoid
cost =-1/m*np.sum(Y*log(A)+(1-Y)*np.log(1-A)) #loss function
dw = 1/m*np.dot(X,(A-Y).T)
db = 1/m*np.sum(A-Y)
grads = {"dw": dw, "db":db} #tranform to dictionary for using conveniently
np.squeeze(cost) #delete redundancy dimensions
use dw and db to optimize the arguments:
def optimize(w,b,X,Y,num_iterations, learning_rate):
for i in range(num_iterations):
costs=[] #creat a list
grads, cost =propagate(w,b,X,Y)
dw =grad["dw"]
db = grad["db"]
w = w-learning_rate *dw
b = b -learning_rate *db
if(i%100==0) print("iteration %i: %f",%(i,cost))
params={"w":w ,"b":b}
grads = {"dw":dw,"db",b}
4. predict the data(w,b,x)
use trained arguments: w , b to predict the data
def predict(w,b,X):
m= X.shape[1]
Y_prediction = np.zeros((1,m)) #creat a matrix to store prediction data
A = sigmoid(np.dot(w.T,X)+b)
for i in range(A.shape[i]):
if A[0,1] <=0.5:
Y_prediction[0,i] = 0
else:
Y_prediction[0,i] = 1
return Y-prediction
5.Combine as a whole model
def model(X_train,Y_train,X_test,Y_test,num_iterations=2000,learning_rate=0.5):
w, b=initialize_with_zeros(X_train.shape[0])
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = parameters["w"]
b = parameters["b"]
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
1.4 result analyze
1. plot the loss function and the gradient
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()
the loss is decreasing means the program is learning the arguments. When we increase the iterations .Probably the accuracy of training set is improving, whereas the accuracy of test set is reducing. this is overfitting
- choose learning rate
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
print ("learning rate is: " + str(i))
models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
print ('\n' + "-------------------------------------------------------" + '\n')
for i in learning_rates:
plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))
plt.ylabel('cost')
plt.xlabel('iterations')
legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()
learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %
-------------------------------------------------------
learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %
-------------------------------------------------------
learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %
-------------------------------------------------------
解释:
- 不同的学习率会带来不同的损失,因此会有不同的预测结果。
- 如果学习率太大(0.01),则成本可能会上下波动。 它甚至可能会发散(尽管在此示例中,使用0.01最终仍会以较高的损失值获得收益)。
- 较低的损失并不意味着模型效果很好。当训练精度比测试精度高很多时,就会发生过拟合情况。
- 在深度学习中,我们通常建议你:
- 选择好能最小化损失函数的学习率。
- 如果模型过度拟合,请使用其他方法来减少过度拟合。 (我们将在后面的教程中讨论。)
本文reference:https://www.kesci.com/mw/project/5dd7a246f41512002ceb3d6b