模型蒸馏时loss不收敛 huggingface

最新推荐文章于 2024-07-23 00:17:23 发布

Wwwilling

最新推荐文章于 2024-07-23 00:17:23 发布

阅读量420

点赞数 7

文章标签：深度学习机器学习人工智能

本文链接：https://blog.csdn.net/qq_43058281/article/details/136612833

版权

Check the Data: Ensure that your data is clean, properly preprocessed, and balanced. Imbalanced datasets can cause the model to be biased towards the majority class, leading to poor convergence.
Learning Rate: The learning rate you’ve set (5e-3) might be too high. Try reducing the learning rate to see if it helps the loss to converge. You can also use learning rate 2e-5 schedulers that adjust the learning rate during training.
Batch Size: Experiment with different batch sizes. Sometimes smaller batch sizes can help with convergence as they provide a more noisy gradient which can help escape local minima.
Model Complexity: If your model is too complex for the task, it might overfit and not converge well. Try simplifying the model or using regularization techniques like dropout.
Loss Function: Make sure you are using the appropriate loss function for your task. For binary classification, Binary Cross-Entropy is commonly used.

trainer = Trainer(
...
compute_loss=lambda outputs, labels: torch.nn.functional.binary_cross_entropy_with_logits(outputs.logits, labels.float().unsqueeze(1))
)

Early Stopping: Use early stopping to prevent overfitting. This will stop the training process if the model’s performance on the validation set doesn’t improve for a specified number of epochs.

trainer = Trainer(
...
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)] # Added early stopping
)

Remember to import EarlyStoppingCallback from transformers if you decide to use early stopping.

Gradient Clipping: Gradient clipping can prevent exploding gradients which can cause the loss to diverge.

training_args = TrainingArguments(
..
max_grad_norm=1.0, # Gradient clipping
)

Wwwilling

关注

7
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫