论文题目:AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder
作者:Tal Shaharbany (Tel Aviv University),* Aviad Dahan (Tel Aviv University), Raja Giryes (Tel Aviv University), Lior Wolf (Tel Aviv University, Israel)
摘要:The recently introduced Segment Anything Model (SAM) combines a clever architecture and large quantities of training data to obtain remarkable image segmentation capabilities. However, it fails to reproduce such results for Out-Of-Distribution (OOD) domains such as medical images. Moreover, while SAM is conditioned on either a mask or a set of points, it may be desirable to have a fully automatic solution. In this work, we replace SAM’s conditioning with an encoder that operates on the same input image. By adding this encoder and without further fine-tuning SAM, we obtain state-of-the-art results on multiple medical images and video benchmarks. This new encoder is trained via gradients provided by a frozen SAM. For inspecting the knowledge within it, and providing a lightweight segmentation solution, we also learn to decode it into a mask by a shallow deconvolution network. Our code is publicly available at
https://github.com/talshaharabany/AutoSAM
代码:https://github.com/talshaharabany/AutoSAM
视频:https://bmvc2022.mpi-inf.mpg.de/BMVC2023/0530_video.mp4
海报:https://bmvc2022.mpi-inf.mpg.de/BMVC2023/0530_poster.pdf
会议pdf:https://papers.bmvc2023.org/0530.pdf
arXiv pdf:http://arxiv.org/abs/2306.06370v1
会议链接:http://bmvc2022.mpi-inf.mpg.de/BMVC/ (British Machine Vision Conference)
会议github入口:https://britishmachinevisionassociation.github.io/
会议简介:The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).
文章引用:@inproceedings{Shaharbany_2023_BMVC,
first commit
author = {Tal Shaharbany and Aviad Dahan and Raja Giryes and Lior Wolf},
title = {AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year = {2023},
Update
url = {https://papers.bmvc2023.org/0530.pdf}
first commit
}
这个是我最看好的SAM微调了,尝试复现一下:
1、下载SAM checkpoint。
需要使用上网技巧从Google Drive下载三个文件:SAM_base SAM_large SAM_huge
实际上,这里的权重文件和SAM的github主页下的文件完全一样,其文件大小完全一样。此外,我使用算法计算了哈希值,也是相同的,并不非得从Google Drive上下载。
2、导入代码。
git clone https://github.com/your_name/AutoSAM.git
cd AutoSAM
3、新建conda环境
conda create --name autosam python=3.10
pip install -r requirements.txt
亲测,这步的requirements里面包含了很多不能pip的包,也不好单独摘出来。后面代码里用到哪个包现装就行,不要pip install了。
4、开始训练
python train.py
先在https://download.pytorch.org/whl/torch_stable.html下面安装python=3.10的cu102的torch=1.11.0和torchvision=0.12.0,然后安装tqdm, opencv-python (requirements.txt中安装的是最新版4.9.0.80,不需要编译,很快就装好了), pandas。
根据代码,需要将下载的权重文件链接到./cp文件夹:
ln -s ../downloads-tal-AutoSAM cp
根据代码,需要将GlaS数据集链接到./Warwick文件夹。
根据代码,默认运行的是vit_h,如果vit_h报错CUDA out of memory,可以将sam_args里的sam_checkpoint和model_type两项中的vit_h换成vit_b试试,应该就可以运行了。
这里只能单卡训练,多卡训练暂时不支持。
训练结果生成在./results下的文件夹里,包括:①一个最佳的权重net_best.pth;②一个记录best results的表格best.csv(根据代码,这个best results指的是这一个epoch的测试集inference后的平均IoU值,即mIoU);③一个vis文件夹(是空的,可以后续自行将测试图像可视化出来,代码里仅新建了这一文件夹,并没有保存内容)。
它这个results文件夹下的子文件夹命名属实有些诡异,我个人建议进行如下修改:将51行open_folder函数下的mkdir改为:import time; os.mkdir(path + '/' + str(time.strftime('%Y-%m-%d-%H'+'h'+'%M'+'m'+'%S'+'s'))),则文件夹将命名为如:2024-02-29-13h53m58s(想法很好,但是改完后代码不通了。先维持原样。)
另外,在训练之前,建议将train的Loss和inference的Dice/mIoU都记录在csv上,或者直接记录在tensorboard的summary里面绘制曲线,这样出点图会很漂亮。
这里的epoch数虽然默认是5000,但是使用vit_b在单卡上用时38分钟训练了11轮,mIoU就已经达到0.8379了,在单卡上用时大约38分钟。用时1小时43分钟训练了29轮,mIoU达到0.8658。
未完待续