在使用timeseries数据集训练分类器模型时,如果使用整个训练数据集的最小值/最大值来规范化/缩放,那么也会考虑到未来的值,而在实际的场景中,您不会有这些信息,对吗?好的,那么您应该只使用您的培训数据来构建缩放器:scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
但是如果新的价值观与培训的价值观稍有不同呢?在
考虑到这一点,我认为:
^{pr2}$
或者只取列车和试验的平均值和标准差,使试验输入正常化:X_train_mean = np.mean(X_train)
X_train_std = np.std(X_train)
X_train_normalized = (X_train - X_train_mean) / X_train_std
X_test_mean = np.mean(y_test)
X_test_std = np.std(y_test)
new_mean = (X_train_mean + X_test_mean) / 2
new_std = (X_train_std + X_test_std) / 2
X_test_normalized = (X_test - new_mean) / new_std
关于this log1p solution,它与log(1+x)相同,因此可以在(-1;∞)上工作吗?或者exp normalize呢?在
处理这种情况的最佳实践是什么?在