损失函数loss
损失函数loss是预测值(y)和已知答案(y_)的差距
其中yi是一个batch中第i个数据的真实值,而yi’是NN的预测值。
使用例子:
y_true = tf.constant([0.5, 0.8])
y_pred = tf.constant([1.0, 1.0])
print(tf.keras.losses.MSE(y_true, y_pred))
运行结果:
>>> tf.Tensor(0.145, shape=(), dtype=float32)
等价实现:
print(tf.reduce_mean(tf.square(y_true - y_pred)))
运行结果:
>>> tf.Tensor(0.145, shape=(), dtype=float32)
例子:预测酸奶日销量
采集数据:每日x1,x2和销量y_(y_=x1+x2),噪声-0.05~0.05,拟合可以预测销量的函数。
把这套数据集喂入神经网络,构造一个一层的神经网络,预测酸奶日销量。
import tensorflow as tf
import numpy as np
SEED = 23455 #随机种子
rdm = np.random.RandomState(seed=SEED) # 生成[0,1)之间的随机数
x = rdm.rand(32, 2) #生成32行2列的输入特征x,包含32组0-1之间的随机数x1和x2
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]
# 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)
w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))
#随机初始化参数w1,初始化为2行1列
训练部分
epoch = 15000
#数据集迭代15000次
lr = 0.002
#学习率
#for循环中用with结构求前向传播结果y
for epoch in range(epoch):
with tf.GradientTape() as tape:
y = tf.matmul(x, w1)
loss_mse = tf.reduce_mean(tf.square(y_ - y))
#求均方误差损失函数loss_mse
grads = tape.gradient(loss_mse, w1)
#损失函数对带训练参数w1求偏导
w1.assign_sub(lr * grads)
#更新参数w1
#每迭代500轮打印一次w1
if epoch % 500 == 0:
print("After %d training steps,w1 is " % (epoch))
print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())
运行结果
After 0 training steps,w1 is
[[-0.8096241]
[ 1.4855157]]
After 500 training steps,w1 is
[[-0.21934733]
[ 1.6984866 ]]
After 1000 training steps,w1 is
[[0.0893971]
[1.673225 ]]
After 1500 training steps,w1 is
[[0.28368822]
[1.5853055 ]]
After 2000 training steps,w1 is
[[0.423243 ]
[1.4906037]]
After 2500 training steps,w1 is
[[0.531055 ]
[1.4053345]]
After 3000 training steps,w1 is
[[0.61725086]
[1.332841 ]]
After 3500 training steps,w1 is
[[0.687201 ]
[1.2725208]]
After 4000 training steps,w1 is
[[0.7443262]
[1.2227542]]
After 4500 training steps,w1 is
[[0.7910986]
[1.1818361]]
After 5000 training steps,w1 is
[[0.82943517]
[1.1482395 ]]
After 5500 training steps,w1 is
[[0.860872 ]
[1.1206709]]
After 6000 training steps,w1 is
[[0.88665503]
[1.098054 ]]
After 6500 training steps,w1 is
[[0.90780276]
[1.0795006 ]]
After 7000 training steps,w1 is
[[0.92514884]
[1.0642821 ]]
After 7500 training steps,w1 is
[[0.93937725]
[1.0517985 ]]
After 8000 training steps,w1 is
[[0.951048]
[1.041559]]
After 8500 training steps,w1 is
[[0.96062106]
[1.0331597 ]]
After 9000 training steps,w1 is
[[0.9684733]
[1.0262702]]
After 9500 training steps,w1 is
[[0.97491425]
[1.0206193 ]]
After 10000 training steps,w1 is
[[0.9801975]
[1.0159837]]
After 10500 training steps,w1 is
[[0.9845312]
[1.0121814]]
After 11000 training steps,w1 is
[[0.9880858]
[1.0090628]]
After 11500 training steps,w1 is
[[0.99100184]
[1.0065047 ]]
After 12000 training steps,w1 is
[[0.9933934]
[1.0044063]]
After 12500 training steps,w1 is
[[0.9953551]
[1.0026854]]
After 13000 training steps,w1 is
[[0.99696386]
[1.0012728 ]]
After 13500 training steps,w1 is
[[0.9982835]
[1.0001147]]
After 14000 training steps,w1 is
[[0.9993659]
[0.999166 ]]
After 14500 training steps,w1 is
[[1.0002553 ]
[0.99838644]]
Final w1 is: [[1.0009792]
[0.9977485]]
交叉熵损失函数CE(Cross Entropy)
交叉熵(Cross Entropy)表征两个概率分布之间的距离,交叉熵越大说明二者分布越远,交叉熵越小说明二者分布越接近,是分类问题中使用较广泛的损失函数。
其中 代表数据的真实值, 代表神经网络的预测值。对于多分类问题,神经网络的输出一般不是概率分布,因此需要引入softmax层,使得输出服从概率分布。
tf.keras.losses.categorical_crossentropy
功能:计算交叉熵.
等价API:tf.losses.categorical_crossentropy
例子:
y_true = [1, 0, 0]
y_pred1 = [0.5, 0.4, 0.1]
y_pred2 = [0.8, 0.1, 0.1]
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred1))
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred2))
结果
>>> tf.Tensor(0.6931472, shape=(), dtype=float32)
tf.Tensor(0.22314353, shape=(), dtype=float32)
等价实现:
print(-tf.reduce_sum(y_true * tf.math.log(y_pred1)))
print(-tf.reduce_sum(y_true * tf.math.log(y_pred2)))
结果
>>> tf.Tensor(0.6931472, shape=(), dtype=float32)
tf.Tensor(0.22314353, shape=(), dtype=float32)
同时计算softmax和交叉熵
tf.nn.softmax_cross_entropy_with_logits(labels, logits, axis=-1, name=None)
功能:logits经过softmax后,与labels进行交叉熵计算.
在机器学习中,对于多分类问题,把未经softmax归一化的向量值称为logits。logits经过softmax层后,输出服从概率分布的向量。
参数:
labels: 在类别这一维度上,每个向量应服从有效的概率分布. 例如,在labels的shape为
[batch_size, num_classes]的情况下,labels[i]应服从概率分布.
logits: 每个类别的激活值,通常是线性层的输出. 激活值需要经过softmax归一化.
axis: 类别所在维度,默认是-1,即最后一个维度.
返回:softmax交叉熵损失值
例子:
abels = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
print(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
>>> tf.Tensor([0.16984604 0.02474492], shape=(2,), dtype=float32)
等价实现:
print(-tf.reduce_sum(labels * tf.math.log(tf.nn.softmax(logits)), axis=1))
>>> tf.Tensor([0.16984606 0.02474495], shape=(2,), dtype=float32)