Candidate Sampling
@(机器学习)
当我们有一个多类或者多标签的分类问题时,训练的样本为 (xi,Ti) , T 只是一个所有类标L的一个极小的子集
“Exhaustive” training methods such as softmax and logistic regression require us to compute F(x, y) for every class y ∈ L for every training example. When |L| is very large, this can be prohibitively expensive.
“Candidate Sampling”:从所有类标中抽样一部分子集
Q(y|x):给定x,抽样类出现的概率,抽样函数
NCE and Negative Sampling generalize to the case where T is a multiset. In this i case, P(y|x) denotes the expected count of y in Ti . Similarly, NCE, Negative Sampling, and Sampled Logistic generalize to the case where S is a multiset. In this i case Q(y|x) denotes the expected count of y in S .
Sampled Softmax
在full softmax训练过程中,针对每个样本 (xi,ti) 都要计算对所有的类L计算logits,当L非常大的时候计算代价太高
在sampled softmax中,对每个训练样本,我们根据抽样的函数 Q(y|x) 抽取一个子集 Si⊂L ,每一个类y语概率 Q(y|xi) :
sampled class:
目标:给定一个集合
Ci
,
Ci
中哪一个类才是目标类
对每一个类
y∈Ci
,我们希望计算给定
xi
和
Ci
的情况下,y是目标类的后验概率,即
P(ti=y|x,Ci)
贝叶斯法则:
def sampled_softmax_loss(weights, biases, inputs, labels, num_sampled,
num_classes, num_true=1,
sampled_values=None,
remove_accidental_hits=True,
partition_strategy="mod",
name="sampled_softmax_loss"):
"""Computes and returns the sampled softmax training loss.
This is a faster way to train a softmax classifier over a huge number of
classes.
This operation is for training only. It is generally an underestimate of
the full softmax loss.
仅在训练过程中使用,再预测阶段,使用full softmax
At inference time, you can compute full softmax probabilities with the
expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`.
See our [Candidate Sampling Algorithms Reference]
(../../extras/candidate_sampling.pdf)
Also see Section 3 of [Jean et al., 2014](http://arxiv.org/abs/1412.2007)
([pdf](http://arxiv.org/pdf/1412.2007.pdf)) for the math.
Args:
weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
objects whose concatenation along dimension 0 has shape
[num_classes, dim]. The (possibly-sharded) class embeddings.
biases: A `Tensor` of shape `[num_classes]`. The class biases.
inputs: A `Tensor` of shape `[batch_size, dim]`. The forward
activations of the input network.
labels: A `Tensor` of type `int64` and shape `[batch_size,
num_true]`. The target classes. Note that this format differs from
the `labels` argument of `nn.softmax_cross_entropy_with_logits`.
num_sampled: An `int`. The number of classes to randomly sample per batch.
num_classes: An `int`. The number of possible classes.
num_true: An `int`. The number of target classes per training example.
sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`,
`sampled_expected_count`) returned by a `*_candidate_sampler` function.
(if None, we default to `log_uniform_candidate_sampler`)
remove_accidental_hits: A `bool`. whether to remove "accidental hits"
where a sampled class equals one of the target classes. Default is
True.
partition_strategy: A string specifying the partitioning strategy, relevant
if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported.
Default is `"mod"`. See `tf.nn.embedding_lookup` for more details.
name: A name for the operation (optional).
Returns:
A `batch_size` 1-D tensor of per-example sampled softmax losses.
"""
logits, labels = _compute_sampled_logits(
weights, biases, inputs, labels, num_sampled, num_classes,
num_true=num_true,
sampled_values=sampled_values,
subtract_log_q=True,
remove_accidental_hits=remove_accidental_hits,
partition_strategy=partition_strategy,
name=name)
sampled_losses = nn_ops.softmax_cross_entropy_with_logits(logits, labels)
# sampled_losses is a [batch_size] tensor.
return sampled_losses