在我们实现双塔等模型时一些负样本难以获取,这时我们可以通过batch内负采样的方式来实现训练,此时采集的样本数据仅需要正样本就够了。
我们先来看一个案例,通过此案例来理解此代码
import tensorflow as tf
import random
batchSize = 4
NEG = 2
normalize = tf.keras.layers.Lambda(lambda x: tf.keras.backend.l2_normalize(x, axis=1))
item_y = tf.constant([[2, 3],[5, 6],[7,1],[3,4]], dtype=tf.float32)
item_y = normalize(item_y)
item_y_temp = item_y
user_y = tf.constant([[2,3],[5,6],[7,1],[3,4]], dtype=tf.float32)
user_y = normalize(user_y)
for i in range(NEG):
rand = int((random.random() + i) * batchSize / NEG)
print(rand)
if rand == 0:
rand += 1
if rand == 4:
rand = rand - 1
item_y = tf.concat([item_y,
tf.slice(item_y_temp, [rand, 0], [batchSize - rand, -1]),
tf.slice(item_y_temp, [0, 0], [rand, -1])], 0)
print(item_y)
user_y_test = tf.tile(user_y, [NEG + 1, 1])
print(user_y_test)
prod_raw = tf.reduce_sum(tf.multiply(tf.tile(user_y, [NEG + 1, 1]), item_y), 1, True)
print(prod_raw)
prod = tf.transpose(tf.reshape(tf.transpose(prod_raw), [NEG + 1, batchSize]))
print(prod)
运行这段代码的输出是:
0
tf.Tensor(
[[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.55470014 0.8320502 ]], shape=(8, 2), dtype=float32)
2
tf.Tensor(
[[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.55470014 0.8320502 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]], shape=(12, 2), dtype=float32)
tf.Tensor(
[[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]
[0.55470014 0.8320502 ]
[0.6401844 0.7682213 ]
[0.98994946 0.14142135]
[0.6 0.8 ]], shape=(12, 2), dtype=float32)
tf.Tensor(
[[0.99999976]
[1.0000001 ]
[0.99999994]
[1. ]
[0.99430907]
[0.7423931 ]
[0.70710677]
[0.9984603 ]
[0.6667947 ]
[0.99868774]
[0.6667947 ]
[0.99868774]], shape=(12, 1), dtype=float32)
tf.Tensor(
[[0.99999976 0.99430907 0.6667947 ]
[1.0000001 0.7423931 0.99868774]
[0.99999994 0.70710677 0.6667947 ]
[1. 0.9984603 0.99868774]], shape=(4, 3), dtype=float32)
此时我们将此代码用于我们的模型之中,然后加上softmax,loss改为category_loss代码就可运行一半了,但是我们发现仿佛在一个epoch快要运行结束时报错了,报错为:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected size[0] in [0, 252], but got 906
[[node dssm/Slice (defined at /data1/starfm/starfmRec/model/dssm/dssm.py:70) ]] [Op:__inference_train_function_55102]
Errors may have originated from an input operation.
Input Source operations connected to node dssm/Slice:
dssm/lambda_1/l2_normalize (defined at /data1/starfm/starfmRec/model/dssm/dssm.py:42)
它的原因为:我们的数据被分为一个batch_size,一个batch_size的输入模型中,在最后总会剩余不足一个batch的情况,这个数据输入会导致我们原先对设定的batch_size相关的维度计算错误,我修改的措施是丢掉不足一个batch的数据,实现方法为:
dataset = dataset.batch(batch_size, drop_remainder=True)
在该函数中添加参数drop_remainder为true,这样模型就可以成功运行了。