当数据处理不均衡时,比如处理癌症训练问题,有病样本很少,参考:
http://www.deepideas.net/unbalanced-classes-machine-learning/
主要从两个方面着手:
一、loss函数的权重问题
训练时,设置的权重:
class_weight={ 1: n_non_cancer_samples / n_cancer_samples * t }
二、编译时设置模型的metrics
def sensitivity(y_true, y_pred): true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) possible_positives = K.sum(K.round(K.clip(y_true, 0, 1))) return true_positives / (possible_positives + K.epsilon()) def specificity(y_true, y_pred): true_negatives = K.sum(K.round(K.clip((1-y_true) * (1-y_pred), 0, 1))) possible_negatives = K.sum(K.round(K.clip(1-y_true, 0, 1))) return true_negatives / (possible_negatives + K.epsilon())
model.compile( loss='binary_crossentropy', optimizer=RMSprop(0.001), metrics=[sensitivity, specificity] )