Computer programming of one-hidden-layer neural network(No neural network packets were used)——单隐藏层神经网络编程+无调包
- 1. Generate n n n equidistant data points within the interval [ − 1 , 1 ] [-1,1] [−1,1].——生成-1到1区间内的n个点,这n个点为Runge函数上的点
- 2. Calculate the Loss function——计算loss函数( y j y_j yj为真值, y j ^ \hat {y_j} yj^为近似值)
- Given dataset { ( x j , y j ) } j = 1 n \{(x_j,y_j)\}^n_{j=1} {(xj,yj)}j=1n. A neural network with one hidden layer and m m m nodes approximates y j y_j yj with y j ^ \hat{y_j} yj^:
- y j ^ = ∑ i = 1 m c i f ( w i x j + b i ) \hat{y_j}=\sum^m_{i=1} c_i f(w_ix_j+b_i) yj^=i=1∑mcif(wixj+bi)
- where w , b , c ∈ R m w,b,c\in\mathbb{R}^m w,b,c∈Rm are the parameters, f f f is the activation function.
- The loss function is defined as:
- 3. Calculate the gradient——计算梯度
1. Generate n n n equidistant data points within the interval [ − 1 , 1 ] [-1,1] [−1,1].——生成-1到1区间内的n个点,这n个点为Runge函数上的点
# You don't need this line if you don't run this code in Jupyter notebook
%matplotlib inline
############################################
# You only need this part
import numpy as np
# n is the number of data points
n=21
# Choose x to be equidistant points
x=np.linspace(-1,1,n)
# Calculate the corresponding y
y=1/(1+25*x**2)
############################################
# For visulization
import matplotlib.pyplot as plt
fig1=plt.figure()
ax1=fig1.add_subplot(111)
ax1.plot(x,y,"or")
plt.show()
2. Calculate the Loss function——计算loss函数( y j y_j yj为真值, y j ^ \hat {y_j} yj^为近似值)
{ ( x j , y j ) } j = 1 n \{(x_j,y_j)\}^n_{j=1} {(xj,yj)}j=1n. A neural network with one hidden layer and m m m nodes approximates y j y_j yj with y j ^ \hat{y_j} yj^:
Given datasety j ^ = ∑ i = 1 m c i f ( w i x j + b i ) \hat{y_j}=\sum^m_{i=1} c_i f(w_ix_j+b_i) yj^=i=1∑mcif(wixj+bi)
where w , b , c ∈ R m w,b,c\in\mathbb{R}^m w,b,c∈Rm are the parameters, f f f is the activation function.
The loss function is defined as:
L ( w , b , c ) = 1 2 ∑ j = 1 n ∥ y j − y j ^ ∥ 2 = 1 2 ∑ j = 1 n ∥ y j − ∑ i = 1 m c i f ( w i x j + b i ) ∥ 2 \begin{aligned} L(w,b,c)&=\frac{1}{2}\sum^n_{j=1}\|y_j-\hat{y_j}\|^2\\ &=\frac{1}{2}\sum^n_{j=1}\|y_j-\sum^m_{i=1} c_i f(w_ix_j+b_i)\|^2 \end{aligned} L(w,b,c)=21j=1∑n∥yj−yj^∥2=21j=1∑n∥yj−i=1∑mcif(wixj+bi)∥2
##############################################################################
# You only need this part
# The relu function
def my_relu(x):
return (abs(x) + x) / 2
# The sigmoid function
def my_sigmoid(x):
return 1/(1+np.exp(-x))
# This function is used for calculating \hat{y}_j given x_j,w,b,c
def my_predict(x,w,b,c,method):
yhat=0
m=w.shape[0]
if method=="relu":
for i in range(m):
z=w[i]*x+b[i]
yhat=yhat+c[i]*my_relu(z)
elif method=="sigmoid":
for i in range(m):
z=w[i]*x+b[i]
yhat=yhat+c[i]*my_sigmoid(z)
return yhat
# This function calculates the value of the loss function
def my_loss(x,y,w,b,c,method):
loss=0
for i in range(y.shape[0]):
yhat=my_predict(x[i],w,b,c,method)
loss=loss+(yhat-y[i])**2
return 0.5*loss
###############################################################################
# m is the number of nodes you use
m=2
# w,b,c are the decision variables to be optimized, here we just select fixed values
w=np.random.randn(m)
b=np.random.randn(m)
c=np.random.randn(m)
# Try relu activation function and sigmoid activation function
print(my_loss(x,y,w,b,c,"relu"))
print(my_loss(x,y,w,b,c,"sigmoid"))
11.637683397902427
8.748887143303836
3. Calculate the gradient——计算梯度
We then calculate the gradient with respect to each element of w , b , c w,b,c w,b,c.
(对w,b和c分别求偏导)
∂ L ∂ w i = ∑ j = 1 n ( y j − ∑ i = 1 m c i f ( w i x j + b i ) ) ( − c i ∇ f ( w i x j + b i ) ) x j ∂ L ∂ b i = ∑ j = 1 n ( y j − ∑ i = 1 m c i f ( w i x j + b i ) ) ( − c i ∇ f ( w i x j + b i ) ) ∂ L ∂ c i = ∑ j = 1 n ( y j − ∑ i = 1 m c i f ( w i x j + b i ) ) ( − f ( w i x j + b i ) ) \begin{aligned} \frac{\partial L}{\partial w_i}&=\sum^n_{j=1} \left(y_j-\sum^m_{i=1} c_i f(w_ix_j+b_i)\right)\left(-c_i\nabla f(w_ix_j+b_i)\right)x_j\\ \frac{\partial L}{\partial b_i}&=\sum^n_{j=1} \left(y_j-\sum^m_{i=1} c_i f(w_ix_j+b_i)\right)\left(-c_i\nabla f(w_ix_j+b_i)\right)\\ \frac{\partial L}{\partial c_i}&=\sum^n_{j=1} \left(y_j-\sum^m_{i=1} c_i f(w_ix_j+b_i)\right)\left(-f(w_ix_j+b_i)\right) \end{aligned} ∂wi∂L∂bi∂L∂ci∂L=j=1∑n(yj−i=1∑mcif(wixj+bi))(−ci∇f(wixj+bi))xj=j=1∑n(yj−i=1∑mcif(wixj+bi))(−ci∇f(wixj+bi))=j=1∑n(yj−i=1∑mcif(wixj+bi))(−f(wixj+bi))
############################################################################
# You only need this part
# This function calculates the gradient of relu function
def my_gradrelu(x):
return np.sign(x+abs(x))
# This function calculates the gradient of sigmoid function
def my_gradsigmoid(x):
return my_sigmoid(x)*(1-my_sigmoid(x))
# This function calculate the gradient with respect to c
def my_gradc(x,y,w,b,c,method):
gc=np.zeros(w.shape[0])
for i in range(w.shape[0]):
if method=="sigmoid":
for j in range(x.shape[0]):
gc[i]=gc[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-my_sigmoid(w[i]*x[j]+b[i]))
elif method=="relu":
for j in range(x.shape[0]):
gc[i]=gc[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-my_relu(w[i]*x[j]+b[i]))
return gc
# This function calculate the gradient with respect to b
def my_gradb(x,y,w,b,c,method):
gb=np.zeros(w.shape[0])
for i in range(w.shape[0]):
if method=="sigmoid":
for j in range(x.shape[0]):
gb[i]=gb[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-c[i]*my_gradsigmoid(w[i]*x[j]+b[i]))
elif method=="relu":
for j in range(x.shape[0]):
gb[i]=gb[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-c[i]*my_gradrelu(w[i]*x[j]+b[i]))
return gb
# This function calculate the gradient with respect to w
def my_gradw(x,y,w,b,c,method):
gw=np.zeros(w.shape[0])
for i in range(w.shape[0]):
if method=="sigmoid":
for j in range(x.shape[0]):
gw[i]=gw[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-c[i]*my_gradsigmoid(w[i]*x[j]+b[i]))*x[j]
elif method=="relu":
for j in range(x.shape[0]):
gw[i]=gw[i]+(y[j]-my_predict(x[j],w,b,c,method))*(-c[i]*my_gradrelu(w[i]*x[j]+b[i]))*x[j]
return gw
###############################################################################
# m is the number of nodes you use
m=2
# As is mentioned before, w,b,c are the decision variables to be optimized, here we just select fixed values
w=np.random.randn(m)
b=np.random.randn(m)
c=np.random.randn(m)
# Try the gradient calculation function
print(my_gradc(x,y,w,b,c,"relu"))
print(my_gradb(x,y,w,b,c,"relu"))
print(my_gradw(x,y,w,b,c,"sigmoid"))
[0. 0.]
[0. 0.]
[-0.35377667 0.08969916]