tensorflow 问题记录
1. 计算梯度
tf.gradients(ys, xs)
- symbolic derivatives of sum of ys w.r.t x in xs
- ys, xs tensor
- return: A list of sum(dy/dx) for each x in xs
- 注意,返回的是y整个tensor对x每个元素的梯度,也即返回和x的shape相同
2. cross-entropy loss
1. tf.nn.sigmoid_cross_entropy_with_logits
针对分类结果独立但是不互斥的情况
# z: labels, x: logits
loss = z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
- loss已经加了负号
- 上式虽然有两部分,但实际上给定label的情况下只有一块,1的时候是 −log(sigmoid(x)) − l o g ( s i g m o i d ( x ) ) ,0的时候是 −log(1−sigmoid(x)) − l o g ( 1 − s i g m o i d ( x ) )
所以说平常见到的 −Exlog(p(x)) − E x l o g ( p ( x ) ) ,我们默认的desired label就是1, 如果是0就应该是 −Exlog(1−p(x)) − E x l o g ( 1 − p ( x ) )
对于GAN中见到的
D_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(logits_real), logits=logits_real), axis=0) \
+ tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.zeros_like(logits_fake), logits=logits_fake), axis=0)
而
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(logits_fake), logits=logits_fake), axis=0)
2. tf.nn.sparse_softmax_cross_entropy_with_logits
类别互斥,所以输出结果要满足概率分布(这个才是softmax)
3. Optimizer
Details: Tensorflow 学习笔记(六)Optimizer
3.1 gradient clip
- 防止梯度爆炸
gvs = optimizer.compute_gradients(loss)
gvs = [(tf.clip_by_value(grad, -10, 10), var) for grad, var in gvs]
train_step = optimizer.apply_gradients(gvs)
3.2 weight decay
# global_step 记录当前是第几个batch
global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
3.0, global_step, 3, 0.3, staircase=True)
optimizer2 = tf.train.GradientDescentOptimizer(learning_rate)
gradients, vriables = zip(*optimizer2.compute_gradients(goal))
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
train_step = optimizer2.apply_gradients(zip(gradients, vriables),
global_step=global_step)
with tf.Session() as sess:
4. python zip
zip wants a bunch of arguments to
zip
together, but what you have is a single argument (a list, whose elements are also lists). The*
in a function call “unpacks” a list (or other iterable), making each of its elements a separate argument. Givenp = [[1,2,3],[4,5,6]]
, without the*
, you’re doingzip([[1,2,3],[4,5,6]], )
. With the*
, you’re doingzip([1,2,3], [4,5,6])
.
a = [1, 2]
b = [3, 4]
c = zip(a, b)
> [(1, 3), (2, 4)]
d = dict(c)
> {1: 3, 2: 4}
e = zip(*c) # equals to zip((1,3), (2, 4))
> [(1, 2), (3, 4)]
5. keras
keras 多GPU
tf.keras.utils.multi_gpu_model
# Instantiate the base model (or "template" model).
# We recommend doing this with under a CPU device scope,
# so that the model's weights are hosted on CPU memory.
# Otherwise they may end up hosted on a GPU, which would
# complicate weight sharing.
with tf.device('/cpu:0'):
model = Xception(weights=None,
input_shape=(height, width, 3),
classes=num_classes)
# Replicates the model on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
# Generate dummy data.
x = np.random.random((num_samples, height, width, 3))
y = np.random.random((num_samples, num_classes))
# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)
# Save model via the template model (which shares the same weights):
model.save('my_model.h5')
详见官网
keras 多进程读取数据
fit_generator
:(self, generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0)
- Trains the model on data generated batch-by-batch by a Python generator or an instance of Sequence.
- The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time data augmentation on images on CPU in parallel to training your model on GPU.
配置
use_multiprocessing=True
workers={n}
, 默认为1
详见官网