Notes of Reading Efficient Large-Scale Multi-Modal Classification
Introduction
Efficient Large-Scale Multi-Modal Classification is an AAAI paper published by the Facebook AI Research group in 2018 (arXiv:1802.02892v1). In this paper, three main contributions of the efficient large-scale multi-modal classification were illustrated:
1. The fusion with continuous features outperformed text-only classification.
2. Discretizing continuous features speeded up and simplified the fusion process.
3. The learned representations for discretized features yielded improved interpretability.
Evaluation datasets and approaches
1. Four evaluation datasets
Table 1. Characteristics of four evaluation datasets
Dataset |
Train |
Valid |
Test |
|||
incident |
words |
incident |
words |
incident |
words |
|
Food101 |
58,131 |
98,365,392 |
6,452 |
10,893,597 |
21,519 |
36,955,182 |
NM-IMDB |
15,552 |
2,564,734 |
2,608 |
425,863 |
7,799 |
1,266,681 |
FlickrTag |
70,243,104 |
1,134,118,808 |
656,687 |
10,100,945 |
621,444 |
9,913,566 |
FlickrTag-1 |
7,166,110 |
92,651,036 |
48,048 |
682,663 |
48,471 |
672,900 |
2. Objective of model training: minimizing the negative log likelihood over the classes.
, where o is the network’s output,