Transformer实战-系列教程3:Vision Transformer 源码解读1

🚩🚩🚩Transformer实战-系列教程总目录

有任何问题欢迎在下面留言
本篇文章的代码运行界面均在Pycharm中进行
本篇文章配套的代码资源已经上传

Vision Transformer 源码解读1
Vision Transformer 源码解读2
Vision Transformer 源码解读3
Vision Transformer 源码解读4

1、整体解读

在这里插入图片描述

在文本任务中大量使用了Transformer 架构,因为文本数据是一个序列非常好的契合Transformer 架构。
可是如何将一张图像展开成一个序列呢?

将一个文本数据使用Transformer 进行特征提取需要把文本embbeding成一个向量

对于图像我们一样也可以embbeding成一个向量,所谓向量实际上不就是特征吗?把图像提取成特征,那就可以套上Transformer 架构,我们照样使用ConvNet把图像提取成特征再调整为向量就可以了。

一个词embbeding成向量,最多的是转化为768维的向量,对于图片使用一个卷积核对整体卷积一次就可以得到一个向量,如果我们使用512个卷积核就可以得到512维的向量,这样就可以和NLP任务对上了。

因此使用Transformer架构做CV任务,只需要加上一层embbeding就可以套用Transformer架构了,一次卷积就可以得到图像全局的特征,当然也要加上位置编码得到位置信息。

有了向量序列,就可以进行self-Attention的堆叠了

2、VIT项目

在这里插入图片描述
运行参数:

--name cifar10-100_500 
--dataset cifar10 
--model_type ViT-B_16 
--pretrained_dir checkpoint/ViT-B_16.npz

容易出错python包:

pip install protobuf==3.20.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install numpy==1.19.5 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install ml-collections -i https://pypi.tuna.tsinghua.edu.cn/simple

我是装了这三个就能正常运行了

3、VIT项目debug

配置参数部分就不需要仔细去看了,遇到哪个参数的时候回去去找就行了

找到train.py------main()函数
到第一个可执行的代码部分开启debug模式,逐行查看

# Setup CUDA, GPU & distributed training
    if args.local_rank == -1:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        args.n_gpu = torch.cuda.device_count()
    else:  # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
        torch.cuda.set_device(args.local_rank)
        device = torch.device("cuda", args.local_rank)
        torch.distributed.init_process_group(backend='nccl',
                                             timeout=timedelta(minutes=60))
        args.n_gpu = 1
    args.device = device

单机单卡、单机多卡、没有安装GPU则执行CPU
指定训练设备

# Setup logging
    logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
                        datefmt='%m/%d/%Y %H:%M:%S',
                        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN)
    logger.warning("Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s" %
                   (args.local_rank, args.device, args.n_gpu, bool(args.local_rank != -1), args.fp16))

设置打印日志的格式

# Set seed
    set_seed(args)
def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)

设置所有的随机种子,为什么args.seed=42,远古大神们都是用的42🤣

# Model & Tokenizer Setup
    args, model = setup(args)
def setup(args):
    # Prepare model
    config = CONFIGS[args.model_type]

    num_classes = 10 if args.dataset == "cifar10" else 100

    model = VisionTransformer(config, args.img_size, zero_head=True, num_classes=num_classes)
    model.load_from(np.load(args.pretrained_dir))
    model.to(args.device)
    num_params = count_parameters(model)

    logger.info("{}".format(config))
    logger.info("Training parameters %s", args)
    logger.info("Total Parameter: \t%2.1fM" % num_params)
    print(num_params)
    return args, model

在这里插入图片描述
看一下都返回了哪些参数,用这些参数初始化一个模型

setup()函数解读,debug模式:

  1. 将所有的参数全部读进来
  2. 数据类别10个
  3. 使用VisionTransformer类构造模型
  4. 加载预训练模型
  5. 模型进入GPU
  6. 统计模型参数量
  7. 设置日志信息
  8. 返回参数和模型

Vision Transformer 源码解读1
Vision Transformer 源码解读2
Vision Transformer 源码解读3
Vision Transformer 源码解读4

  • 28
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Visual segmentation is one of the most important tasks in computer vision, which involves dividing an image into multiple segments, each of which corresponds to a different object or region of interest in the image. In recent years, transformer-based methods have emerged as a promising approach for visual segmentation, leveraging the self-attention mechanism to capture long-range dependencies in the image. This survey paper provides a comprehensive overview of transformer-based visual segmentation methods, covering their underlying principles, architecture, training strategies, and applications. The paper starts by introducing the basic concepts of visual segmentation and transformer-based models, followed by a discussion of the key challenges and opportunities in applying transformers to visual segmentation. The paper then reviews the state-of-the-art transformer-based segmentation methods, including both fully transformer-based approaches and hybrid approaches that combine transformers with other techniques such as convolutional neural networks (CNNs). For each method, the paper provides a detailed description of its architecture and training strategy, as well as its performance on benchmark datasets. Finally, the paper concludes with a discussion of the future directions of transformer-based visual segmentation, including potential improvements in model design, training methods, and applications. Overall, this survey paper provides a valuable resource for researchers and practitioners interested in the field of transformer-based visual segmentation.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

机器学习杨卓越

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值