Two contributions
1. jointly optimize the classification loss(softmax) and the similarity loss(triplet) om CNN, which can generate both categorization results
and discriminative feature representations.
2.embed label structures(make, model and year of cars) or attributes(ingredients of food)
siamese network
combine softmax and contrastive loss in CNN via joint optimization
triplet:类内差距小于类间差距就好,triplet can preserve the intra-class variation
Regarding the sampling strategy, methods in Facenet( A unified embedding for face recognition and clustering), or employ hard mining approaches to explore challenging examples in the training data
Testing stage: generates classification result through the softmax layer, or the fine-grained feature representation after the l2 normalization
Embed label structures
hierarchical labels