要从零开始写一个神经网络,通常的方法是
- 定义网络结构(指定输入层、隐藏层、输出层大小)
- 初始化模型参数
- 循环操作:前向传播–计算损失–反向传播–权值更新
本文写的是一个最简单的感知机模型,所以网络结构无需特别定义。首先我们定义一个激活函数,这里采用sigmoid函数
def sigmoid(x):
return 1/(1+np.exp(-x))
第二步初始化模型参数,包括权值w和偏置b
def initialize_paramters(dims):
w = np.zeros((dims,1))
b = 0.0
return w, b
第三步进入循环,由于
ϕ
(
x
)
=
1
1
+
e
−
(
w
x
+
b
)
\phi(x)=\frac{1}{1+e^{-(wx+b)}}
ϕ(x)=1+e−(wx+b)1值域为[0,1],
本文考虑的是最简单的二分类问题,所以可以假设
P
(
y
=
1
∣
x
,
w
,
b
)
=
ϕ
(
x
)
P(y=1|x,w,b)=\phi(x)
P(y=1∣x,w,b)=ϕ(x),
P
(
y
=
0
∣
w
,
x
,
b
)
=
1
−
ϕ
(
x
)
P(y=0|w,x,b)=1-\phi(x)
P(y=0∣w,x,b)=1−ϕ(x),
将两者合并得到
P
(
y
∣
x
,
w
,
b
)
=
ϕ
(
x
)
y
(
1
−
ϕ
(
x
)
)
1
−
y
P(y|x,w,b)=\phi(x)^y(1-\phi(x))^{1-y}
P(y∣x,w,b)=ϕ(x)y(1−ϕ(x))1−y,
其最大似然函数为
L
(
w
,
x
,
b
)
=
Π
i
=
1
n
P
(
y
(
i
)
∣
x
(
i
)
,
w
,
b
)
L(w,x,b)=\Pi_{i=1}^nP(y^{(i)}|x^{(i)},w,b)
L(w,x,b)=Πi=1nP(y(i)∣x(i),w,b),
我们要让L(w,x,b)最大,因此代价函数取其对数的负数,所以
J
(
w
,
x
,
b
)
=
−
∑
i
=
1
n
y
(
i
)
log
(
ϕ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
ϕ
(
x
(
i
)
)
)
J(w,x,b)=-\sum_{i=1}^ny^{(i)}\log(\phi(x^{(i)}))+(1-y^{(i)})\log(1-\phi(x^{(i)}))
J(w,x,b)=−∑i=1ny(i)log(ϕ(x(i)))+(1−y(i))log(1−ϕ(x(i))),
由于sigmoid函数的性质,
ϕ
′
(
x
)
=
ϕ
(
x
)
(
1
−
ϕ
(
x
)
)
\phi'(x)=\phi(x)(1-\phi(x))
ϕ′(x)=ϕ(x)(1−ϕ(x)),
所以可以得到
∂
J
(
x
,
w
,
b
)
∂
w
=
∑
i
=
1
n
(
ϕ
(
x
(
i
)
)
−
y
(
i
)
)
x
(
i
)
\frac{\partial J(x,w,b)}{\partial w}=\sum_{i=1}^n(\phi(x^{(i)})-y^{(i)})x^{(i)}
∂w∂J(x,w,b)=∑i=1n(ϕ(x(i))−y(i))x(i),
∂
J
(
x
,
w
,
b
)
∂
b
=
∑
i
=
1
n
(
ϕ
(
x
(
i
)
)
−
y
(
i
)
)
\frac{\partial J(x,w,b)}{\partial b}=\sum_{i=1}^n(\phi(x^{(i)})-y^{(i)})
∂b∂J(x,w,b)=∑i=1n(ϕ(x(i))−y(i))
# 前向传播
def propagate(w, b, X, Y):
# 预测值
y_hat = sigmoid(X.dot(w) + b)
# 损失函数
cost = -np.sum(Y*np.log(y_hat)+(1-Y)*np.log(1-y_hat))
# 梯度
dw = X.T.dot(y_hat-Y)
db = np.sum(y_hat-Y)
grads = {'dw' : dw , 'db' : db }
return grads, cost
# 反向传播 权值更新
def backward_propagation(w, b, X, Y, iterations=2000, learning_rate=0.01):
# 损失函数集合
costs = []
for i in range(iterations):
grads, cost = propagate(w, b, X, Y)
# 权值更新
w -= learning_rate * grads['dw']
b -= learning_rate * grads['db']
if i%100 == 0:
costs.append(cost)
params = {'w' : w, 'b' : b}
return params, costs
预测,将模型概率转换成0/1
def predict(w, b, X):
y_pred = sigmoid(X.dot(w) + b)
y_pred[y_pred>=0.5] = 1
y_pred[y_pred<0.5] = 0
return y_pred
模型封装
def model(x_train, y_train, x_test, y_test, iterations=2000, learning_rate=0.01):
w, b = initialize_paramters(X.shape[1])
params, costs = backward_propagation(w, b, x_train, y_train)
y_train_pred = predict(w, b, x_train)
y_test_pred = predict(w, b, x_test)
print('train accuracy: ', np.mean(y_train_pred==y_train))
print('test accuracy: ', np.mean(y_test_pred==y_test))
rs = {
'costs': costs,
'w': w,
'b': b,
'iterations': iterations,
'learning_rate': learning_rate,
'y_train_pred': y_train_pred,
'y_test_pred': y_test_pred
}
return rs
使用sklearn自带的iris数据集测试,由于是二分类只取前100条数据,测试集和训练集的标签数据必须指定维度,否则默认第二个维度不确定,之后前向传播时会传播错误
df = datasets.load_iris()
X = df.data[:100]
Y = df.target[:100]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3)
y_train = y_train.reshape((-1,1))
y_test = y_test.reshape((-1,1))
model(x_train, y_train, x_test, y_test)