Tricks about ML on real world application:
A brief summary:
lesson1: value of pre-training problem: in real world, precise and well-labled data might be expensive and hard to get (e.g. medical vision, credit scoring)
solution:
- get pre-trained model from models zoo
- get data from public dataset or by data crawling
- pre-train the model with cheap, related or noisy but large dataset
- fine-tune with precise data afterwards
lesson2: caveats of real-world label distributions
problem: unbalanced data distribution / unbalanced cost of misclassification
solution:
- more data
- change labeling(group rare data classes into the same one)
- sampling(ignore samples, over- or undersampling, negative mining)
- weighting the loss
lesson3: understanding black box models
problem: why and how a model can make wrong predictions make sure the model cannot be tricked(deliberate attacks) e.g. Tesla accident, AI turing into racist
solution:
- visulazation tools
- still a problem to be solved
Hope it helps!