Siamese (two pretrained weights)-鲸鱼比赛-kernel 解读2

最新推荐文章于 2022-03-26 17:09:53 发布

368chen

最新推荐文章于 2022-03-26 17:09:53 发布

阅读量274

点赞数

分类专栏：图像分类文章标签：鲸鱼比赛 kaggle kernel

本文链接：https://blog.csdn.net/qq_16236875/article/details/88074540

版权

图像分类专栏收录该内容

19 篇文章 1 订阅

订阅专栏

方法是计算标准模型和 bootstrap 模型的分数矩阵的线性组合，试验和误差分析表明标准模型的权重为0.45， bootstrap model模型的权重为0.55。使用了 https://kaggle.com/martinpiotte 预训练的模型，提高到0.9的重要步骤是摆脱拉普吉夫的依赖（它会减慢训练速度)，加载图像为RGB,再train，可以将模型提高0.1。

要注意的是: 更多的epoch 可以提高分数，可能多达500多。

可以用这种技术提升训练时间：https://www.kaggle.com/c/humpback-whale-identification/discussion/74402#444476 .

考虑使用预训练模型，有利于混合。

（1）剔除重复的图像：

计算每张图片的的size(pil_image.open(img).size)
计算每张图片的phash值(phash(img))
如果某两个phash 值相似，换成phash小的那个
相同phash值的图片放在一起,挑选像素高的那张图片

（2）准备bounding box

读取每张图片的bounding box

设置模型的img shape ,水平压缩比，包围框周围添加的边距，以补偿包围框的不准确性

img_shape = (384, 384, 1)  # The image shape used by the model
anisotropy = 2.15  # The horizontal compression ratio
crop_margin = 0.05  # The margin added around the bounding box to compensate for bounding box inaccuracy

（3）去掉new_whale ，只用有确定id 的训练数据

找到这些图片的phash 值，相同phash 的id 放在一起
相同类别的phash 放在一起，(只有一个类别的phash 值)
只有一个类别的phash 值放在一起，key为id，一共有2931 个这样的id
只有一个类别的phash 值放在一个list 中，作为train，有13623 个这样的phase 值，并存有每个phash 的位置

(4)训练类：

构建TrainingData class
get_item 的时候做图像tran，build_transform是构建各种rotation，Zoom ，shift，shear矩阵

（5）加载两个模型的weight:.

加载两个模型的weight:../input/piotte/mpiotte-standard.model和../input/piotte/mpiotte-bootstrap.model（https://www.kaggle.com/martinpiotte/whale-recognition-model-with-score-0-78563/output）
branch_model 训练和预测得到分数,分数加权

（6）结果文件(prepare_submission函数）