1. 问题描述
设训练集为 ( X , y ) (X,y) (X,y),并满足关系: y = W X + r a n d o m ( v e r y s m a l l ) y = WX + random(verysmall) y=WX+random(verysmall) ,X维度为1000并且变量的最高次数为1,样本大小为N 。目标是用单隐含层神经网络求出等效 W W W
2. 模型
如图所示,即把这类问题
转换为这类问题:
并且中间存在如下关系,该关系适用于输入层与隐含层节点。
训练数据生成:
import numpy as np
#维度和大小参数定义
batch_size = 600
input_dim = 100
output_dim = 1
hidden_dim = 200
# 数据虚拟 (x,y)
X = np.random.randn(input_dim,batch_size)
W = np.random.randn(output_dim,input_dim)
y = np.dot(W,X) + np.random.randn(1,batch_size)/10000 #添加偏差项
S函数以及数据正规化:
#Sigmoid函数
def Sigm(X):
return np.clip(1 / (1.0 + np.exp(-X)), 1e-8, 1 - (1e-8))
#数据正规化
def _normalize(X):
X_nor = np.zeros((X.shape[0],X.shape[1]),dtype=float)
X_mean = np.mean(X,1)
X_std = np.std(X,1)
for i in range(X.shape[1]):
X_nor[:,i] = (X[:,i] - X_mean)/X_std
return X_nor
数据处理并生成训练集和验证集:
data_to_t = 0.8
k = int(data_to_t*batch_size)
X = _normalize(X)
y = _normalize(y)
X_train = X[:,0:int(k)]
y_train = y[:,0:int(k)]
X_test = X[:,int(k):batch_size]
y_test = y[:,int(k):batch_size]
为方便处理数据,进行以下操作:
输入层到隐含层关系为:
z ( 1 ) = ( W h ( 1 ) ) T X h A = σ ( z ( 1 ) ) = 1 1 + e z ( 1 ) z^{\left( 1 \right)}=\left( W_{h}^{\left( 1 \right)} \right) ^TX_h \\ A=\sigma \left( z^{\left( 1 \right)} \right) =\frac{1}{1+e^{z^{\left( 1 \right)}}} z(1)=(Wh(1))TXhA=σ(z(1))=1+ez(1)1
隐含层到输出层关系为:
y = ( W h ( 2 ) ) T A h y=\left( W_{h}^{\left( 2 \right)} \right) ^TA_h y=(Wh(2))TAh
由以上公式推导,生成网络:
w1 = np.random.randn(input_dim,hidden_dim)
w2 = np.random.randn(hidden_dim,output_dim)
b1 = np.random.randn(1,hidden_dim)
b2 = np.random.randn(1,output_dim)
x1 = np.ones((1,X_train.shape[1]))
x2 = np.ones((1,X_test.shape[1]))
w1_h = np.concatenate((w1,b1),axis=0)
w2_h = np.concatenate((w2,b2),axis=0)
X_train = np.concatenate((X_train,x1),axis=0)
X_test = np.concatenate((X_test,x2),axis=0)
w1.shape,w2.shape,w1_h.shape,w2_h.shape,X_train.shape,X_test.shape
可以得到每层网络的维度以及权重矩阵的大小:
((100, 200), (200, 1), (101, 200), (201, 1), (101, 480), (101, 120))
3 Loss函数
3.1 Loss函数定义
如图所示,设损失函数为平方损失函数。
3.2 损失函数最小化
通过网络间的梯度下降法求出最适合的网络。
3.2.1 L L L对 W h ( 2 ) W_{h}^{\left( 2 \right)} Wh(2)的导数
损失函数对输出层和隐含层间的任意 w i ( 2 ) w_{i}^{\left( 2 \right)} w</