机器学习中的对抗验证

最新推荐文章于 2025-02-12 20:21:23 发布

猫爱吃鱼the

最新推荐文章于 2025-02-12 20:21:23 发布

阅读量2.3k

点赞数 2

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_39783265/article/details/104848263

版权

机器学习专栏收录该内容

13 篇文章

订阅专栏

对抗验证

交叉验证（Cross Validation）是常用的一种用来评估模型效果的方法。

当样本分布发生变化时，交叉验证无法准确评估模型在测试集上的效果，这导致模型在测试集上的效果远低于训练集。

通过本文，你将通过一个kaggle的比赛实例了解到，样本分布变化如何影响建模，如何通过对抗验证辨别样本的分布变化，以及有哪些应对方法。

直接给链接：https://zhuanlan.zhihu.com/p/93842847

在此之前，不过不懂AUC，请学习下面链接：
https://www.zhihu.com/question/39840928/answer/241440370

贴上自己的代码，以便于理解：

##df_train:用给的训练集
##df_test：用测试机当验证集
df_train = train_lables
df_test = test_lables
# 定义新的Y
df_train['Is_Test'] = 0
df_test['Is_Test'] = 1
# 将 Train 和 Test 合成一个数据集。
df_adv = pd.concat([df_train, df_test])
# features：训练时所用到的特征
X = df_adv[features]
y = df_adv['Is_Test']

模型训练

# 定义模型参数
params = {
    'boosting_type': 'gbdt',
    'colsample_bytree': 1,
    'learning_rate': 0.1,
    'max_depth': 5,
    'min_child_samples': 100,
    'min_child_weight': 1,
    'min_split_gain': 0.0,
    'num_leaves': 20,
    'objective': 'binary',
    'random_state': 50,
    'subsample': 1.0,
    'subsample_freq': 0,
    'metric': 'auc',
    'num_threads': 8
}
cv_pred = []
best_loss = []
test_prob = 0
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
for index, (train_idx, test_idx) in enumerate(fold.split(X, y)):
    lgb_model = lgb.LGBMClassifier(**params)
    train_x, test_x, train_y, test_y = X.loc[train_idx], X.loc[test_idx], y.loc[train_idx], y.loc[test_idx]
    eval_set = [(test_x, test_y)]
    lgb_model.fit(train_x, train_y, eval_set = eval_set, eval_metric='auc',early_stopping_rounds=100,verbose=None)
    best_loss.append(lgb_model.best_score_['valid_0']['auc'])
    print(best_loss, np.mean(best_loss))

AUC结果：
[0.5548410714285714] 0.5548410714285714
[0.5548410714285714, 0.5788339285714286] 0.5668375
[0.5548410714285714, 0.5788339285714286, 0.5695142857142858] 0.5677297619047619
[0.5548410714285714, 0.5788339285714286, 0.5695142857142858, 0.5460357142857143] 0.56230625
[0.5548410714285714, 0.5788339285714286, 0.5695142857142858, 0.5460357142857143, 0.5811589285714286] 0.5660767857142857