论文代码复现之“真”AutoSAM (Tal Shaharbany version)

车同学CHE

已于 2024-04-04 10:20:46 修改

阅读量901

点赞数 5

文章标签： linux python 深度学习图像处理健康医疗

于 2024-02-29 10:45:27 首次发布

本文链接：https://blog.csdn.net/Sniper_Che/article/details/136364281

版权

论文题目：AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder
作者：Tal Shaharbany (Tel Aviv University),* ‪Aviad Dahan‬‏ (Tel Aviv University), Raja Giryes (Tel Aviv University), Lior Wolf (Tel Aviv University, Israel)
摘要：The recently introduced Segment Anything Model (SAM) combines a clever architecture and large quantities of training data to obtain remarkable image segmentation capabilities. However, it fails to reproduce such results for Out-Of-Distribution (OOD) domains such as medical images. Moreover, while SAM is conditioned on either a mask or a set of points, it may be desirable to have a fully automatic solution. In this work, we replace SAM’s conditioning with an encoder that operates on the same input image. By adding this encoder and without further fine-tuning SAM, we obtain state-of-the-art results on multiple medical images and video benchmarks. This new encoder is trained via gradients provided by a frozen SAM. For inspecting the knowledge within it, and providing a lightweight segmentation solution, we also learn to decode it into a mask by a shallow deconvolution network. Our code is publicly available at
https://github.com/talshaharabany/AutoSAM
代码：https://github.com/talshaharabany/AutoSAM
视频：https://bmvc2022.mpi-inf.mpg.de/BMVC2023/0530_video.mp4
海报：https://bmvc2022.mpi-inf.mpg.de/BMVC2023/0530_poster.pdf
会议pdf：https://papers.bmvc2023.org/0530.pdf
arXiv pdf：http://arxiv.org/abs/2306.06370v1
会议链接：http://bmvc2022.mpi-inf.mpg.de/BMVC/ (British Machine Vision Conference)
会议github入口：https://britishmachinevisionassociation.github.io/
会议简介：The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).
文章引用：@inproceedings{Shaharbany_2023_BMVC,
first commit
author    = {Tal Shaharbany and ‪Aviad Dahan‬‏ and Raja Giryes and Lior Wolf},
title     = {AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
Update
url       = {https://papers.bmvc2023.org/0530.pdf}
first commit
}

这个是我最看好的SAM微调了，尝试复现一下：

1、下载SAM checkpoint。

需要使用上网技巧从Google Drive下载三个文件：SAM_base SAM_large SAM_huge

实际上，这里的权重文件和SAM的github主页下的文件完全一样，其文件大小完全一样。此外，我使用算法计算了哈希值，也是相同的，并不非得从Google Drive上下载。

2、导入代码。

git clone https://github.com/your_name/AutoSAM.git

cd AutoSAM

3、新建conda环境

conda create --name autosam python=3.10

pip install -r requirements.txt

亲测，这步的requirements里面包含了很多不能pip的包，也不好单独摘出来。后面代码里用到哪个包现装就行，不要pip install了。

4、开始训练

python train.py

先在https://download.pytorch.org/whl/torch_stable.html下面安装python=3.10的cu102的torch=1.11.0和torchvision=0.12.0，然后安装tqdm, opencv-python (requirements.txt中安装的是最新版4.9.0.80，不需要编译，很快就装好了), pandas。

根据代码，需要将下载的权重文件链接到./cp文件夹：

ln -s ../downloads-tal-AutoSAM cp

根据代码，需要将GlaS数据集链接到./Warwick文件夹。

根据代码，默认运行的是vit_h，如果vit_h报错CUDA out of memory，可以将sam_args里的sam_checkpoint和model_type两项中的vit_h换成vit_b试试，应该就可以运行了。

这里只能单卡训练，多卡训练暂时不支持。

训练结果生成在./results下的文件夹里，包括：①一个最佳的权重net_best.pth；②一个记录best results的表格best.csv（根据代码，这个best results指的是这一个epoch的测试集inference后的平均IoU值，即mIoU）；③一个vis文件夹（是空的，可以后续自行将测试图像可视化出来，代码里仅新建了这一文件夹，并没有保存内容）。

它这个results文件夹下的子文件夹命名属实有些诡异，我个人建议进行如下修改：将51行open_folder函数下的mkdir改为：import time; os.mkdir(path + '/' + str(time.strftime('%Y-%m-%d-%H'+'h'+'%M'+'m'+'%S'+'s')))，则文件夹将命名为如：2024-02-29-13h53m58s（想法很好，但是改完后代码不通了。先维持原样。）

另外，在训练之前，建议将train的Loss和inference的Dice/mIoU都记录在csv上，或者直接记录在tensorboard的summary里面绘制曲线，这样出点图会很漂亮。

这里的epoch数虽然默认是5000，但是使用vit_b在单卡上用时38分钟训练了11轮，mIoU就已经达到0.8379了，在单卡上用时大约38分钟。用时1小时43分钟训练了29轮，mIoU达到0.8658。

未完待续