好久没写博客了,今天突然想到用numpy来实现神经网络,想到就开始干吧。相信大家都对神经网络有了一些了解,在这里我就尽量只写代码,可能会比较枯燥,但是我相信会对一些初学者有帮助。
首先我们定义一个列表,其中包含网络的每一层,类似如下:
lst_nn_layers=[
{"input_dim": 324, "output_dim": 512, "activation": "relu"},
{"input_dim": 512, "output_dim": 256, "activation": "relu"},
{"input_dim": 256, "output_dim": 128, "activation": "relu"},
{"input_dim": 128, "output_dim": 32, "activation": "relu"},
{"input_dim": 32, "output_dim": 1, "activation": "sigmoid"},
]
了解神经网络的朋友应该知道,每一层的输出向量就是下一层的输入向量,因此这个列表中会有很多重复的数字。
接下来写初始网络:
def init_layers(lst_nn_layers,seed=1):
np.random.seed(seed)
dic_params={}
for id,layer in enumerate(len(lst_nn_layers):
#初始化w和b
dic_params['w'+str(id+1)]=np.random.randn(layer['output_dim'],layer['input_dim'])*0.1
dic_params['w'+str(id+1)]=np.random.randn(layer['output_dim'],1)*0.1
return dic_params
接下来我们定义激活函数,这里我们使用了两种激活函数,sigmoid和relu,为了网络同时支持前向传播和后向传播,我们还需要定义激活函数的导数。
def sigmoid(z):
return 1/(1+np.exp(-z))
def relu(z):
return np.maximum(0,z)
def sigmoid_back(dx,z):
sig=sigmoid(z)
return dx*sig*(1-sig)
def relu_back(dx,z):
dZ=np.array(dx,copy=True)
dZ[z<=0]=0
return dZ
接下来完成神经网络的前向传播,数据单向的从输入到输出,这里我会写一个函数用来封装单层的传播方式,然后通过整个的前向传播来调用。
def forward_layer_one(curr_x,curr_w,curr_b,activation):
curr_y=np.dot(curr_w,curr_data)+b
if activation=='relu':
return relu(curr_y),curr_y
else:
return sigmoid(curr_y),curr_y
def forward_layer_full(X,dic_params,lst_nn_layers):
dic_m={}
curr_x=X
for id,layer in enumerate(lst_nn_layers):
prev_x=curr_x
curr_w=dic_params['w'+str(id+1)]
curr_b=dic_params['b'+str(id+1)]
curr_x,curr_y= forward_layer_one(prev_x,curr_w,curr_b,layer['activation'])
dic_m['x'+str(id)]=prev_x
dic_m['y'+str(id+1)]=curr_y
return curr_x,dic_m
接下来我们开始写损失函数,损失函数可以很好的权衡我们的模型的效果。
def loss(y1,y2):
m=y1.shape[1]
cost=-1/m*(np.dot(y2,np.log(y1).T)+np.dot(1-y2,np.log(1-y1).T))
return np.squeeze(cost)
接下来是比较难的方向传播,这里会用到比较难以理解,里面会有微积分,导数,线性代数的计算,不过仔细体会一下也会变得比较清晰。
公式如下:
def backward_layer_one(curr_dx,curr_w,curr_b,curr_y,prev_x,activation):
m=prev_x.shape[1]
if activation == "relu":
backward_activation_func = relu_backward
else:
backward_activation_func = sigmoid_backward
curr_dy=backward_activation_func(curr_dx,curr_y)
curr_dw=np.dot(curr_dy,prev_X.T)/m
curr_db=np.sum(curr_dy,axis=1,keepdims=True)/m
prev_dx=np.dot(curr_w.T,curr_dy)
return prev_dx,curr_dw,curr_db
def backward_layer_full(y1,y2,dic_m,dic_params,lst_nn_layers):
dic_grads={}
m=y2.shape[1]
y2=y2.reshape(y1.shape)
prev_dx=-(np.divide(y2,y1)-np.divide(1-y2,1-y1))
for id,layer in reversed(list(enumerate(lst_nn_layers)):
curr_dx=prev_dx
prev_x=dic_m['x'+str(id+1)]
curr_y=dic_m['y'+str(id+1)]
curr_w=dic_params['w'+str(id+1)]
curr_b=dic_params['b'+str(id+1)]
prev_dx,curr_dw,curr_db=backward_layer_one(curr_dx,curr_w,curr_b,curr_y,prev_x,layer['activation'])
dic_grads['dw'+str(id+1)]=curr_dw
dic_grads['db'+str(id+1)]=curr_db
return dic_grads
反向传播是为了计算网络梯度,最终我们还需要根据梯度来进行优化,更新网络参数。
def update(dic_params,dic_grads,lst_nn_layers,rate):
for id,layer in enumerate(lst_nn_layer):
dic_params['w'+str(id+1)] -=rate*dic_grads['dw'+str(id+1)]
dic_params['b'+str(id+1)] -=rate*dic_grads['db'+str(id+1)]
return dic_params
准备工作都已经完成,最终就是训练。
def train(x,y,lst_nn_layers,epochs,rate):
dic_params=init_layers(lst_nn_layers)
lst_cost=[]
for i in range(epochs):
y1,dic_m=forward_layer_full(x,dic_params,lst_nn_layers)
cost=loss(y1,y)
lst_cost.append(cost)
dic_grads=backward_layer_full(y1,y,dic_m,dic_params,lst_nn_layers)
dic_params=update(dic_params,dic_grads,lst_nn_layers,rate)
return dic_params,lst_cost