GLoRIA精读20240315 （中）

大路诗人路小果

已于 2024-03-21 17:37:23 修改

阅读量382

点赞数 7

文章标签： php 开发语言

于 2024-03-16 19:02:13 首次发布

本文链接：https://blog.csdn.net/a14285700/article/details/136736505

版权

在这里插入图片描述

GLoRIA精读20240315 （中）

今天在使用 R2GENCMN 网络的时候，发现生成注意力图很难，这个是因为中文医学词汇的数据集库分词，并不流行。但是GLoRIA给了我们一个新的思路，那就是使用 局部词汇 与 局部图像 的方法。这也就是对比学的有点，医学图像特有的特点，使用局部有代表性的特征，就能给整个图片进行定义。这就是医学图像和自然图像的本质差别。

论文的GLoRIA里面重点关注了注意力

接着（上），进行代码实验

1. 模型的整体部分

    def forward(self, x):
        img_emb_l, img_emb_g = self.image_encoder_forward(x["imgs"])
        text_emb_l, text_emb_g, sents = self.text_encoder_forward(
            x["caption_ids"], x["attention_mask"], x["token_type_ids"])
        
        return img_emb_l, img_emb_g, text_emb_l, text_emb_g, sents

2. 视觉编码部分（img_encoder）+（generate_embeddings）

2.1 img_encoder

def image_encoder_forward(self, imgs):
    img_feat_g, img_emb_l = self.img_encoder(imgs, get_local=True)
    img_emb_g, img_emb_l = self.img_encoder.generate_embeddings(
        img_feat_g, img_emb_l
    )
    return img_emb_l, img_emb_g

output=self.img_encoder(torch.zeros([1,3,224,224]),True)

In [27]:  output[0].size()
Out[27]: torch.Size([1, 2048]) #是特征向量

In [26]:  output[1].size()
Out[26]: torch.Size([1, 1024, 19, 19]) #是特征图

2.2 generate_embeddings

(global_embedder): Linear(in_features=2048, out_features=768, bias=True)
(local_embedder): Conv2d(1024, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)

In [13]:  output[1].size()
Out[13]: torch.Size([1, 768])

In [12]: output[0].size()
Out[12]: torch.Size([1, 768, 19, 19])

3. 文本编码器

https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT/tree/main

使用的特定的医学分词，tokenizer不适合中文，和中文医学词汇
tokenizer，使用 [101] 作为开始，[102] 作为结束

In [5]:  self.text_encoder.tokenizer("")
Out[5]: {'input_ids': [101, 102], 'token_type_ids': [0, 0], 'attention_mask': [1, 1]}

文本编码的模型结构，有一个

（embedding）+ 12*（layer）

In [35]:  self.model
Out[35]: 
BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): BertPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
)

模型的输出是，是提取 每一层的隐藏层的具体特征 ，这样就像resnet网络一样，在模型浅层可以提取到 局部特征。
但这个是一个强加的理解，实际上这个encoder的过程就是将batch整理干净，长度pad到一样，然后提取出来局部和整体。这里面的def aggregate_tokens(self, embeddings, caption_ids):这个函数看着很复杂，其实就是干的这样一个简单的事情。

In [2]:  input=self.tokenizer("this is a cancner")
In [3]:  input.keys()
In [29]: input['input_ids'].size()
Out[29]: torch.Size([1, 8])

Out[3]: dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
In [13]:  for name in input.keys():
    ...:      input[name]=torch.tensor(input[name]).unsqeeze(0)

In [18]:  output=self.model(**input)
In [19]:  output.keys()
Out[19]: odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states'])

In [27]:  print(hidden_states[0].size())
torch.Size([1, 8, 768])

In [32]:  len(hidden_states)
Out[32]: 13

对于全局和局部
全局就是数学的加或者平均

 if self.aggregate_method == "sum":
     word_embeddings = embeddings.sum(axis=1)
     sent_embeddings = sent_embeddings.sum(axis=1)
 elif self.aggregate_method == "mean":
     word_embeddings = embeddings.mean(axis=1)
     sent_embeddings = sent_embeddings.mean(axis=1)
 else:
     print(self.aggregate_method)
     raise Exception("Aggregation method not implemented")

编码输出的特征形状

In [8]:  x["caption_ids"].size()
Out[8]: torch.Size([5, 512])

In [9]:  x["attention_mask"].size()
Out[9]: torch.Size([5, 512])

In [10]:  x["token_type_ids"].size()
Out[10]: torch.Size([5, 512])

 text_emb_l, text_emb_g, sents = self.text_encoder_forward(
            x["caption_ids"], x["attention_mask"], x["token_type_ids"]
        )
In [1]:  text_emb_l.size()
Out[1]: torch.Size([5, 768, 512])

In [2]:  text_emb_g.size()
Out[2]: torch.Size([5, 768])

In [3]:  type(sents)
Out[3]: list

In [4]:  len(sents)
Out[4]: 5

In [5]:  type(sents[0])
Out[5]: list

In [6]:  len(sents[0])
Out[6]: 512

In [7]:  "".join(sents[0])
Out[7]: '[CLS]双乳腺呈不均匀致密型，前缘凹凸不平，密度不均匀，见片状密度增高影。右乳外上象限见一不规则肿块，大小约1.5cm×1.0cm×1.4cm，边缘毛糙见长短不一毛刺影，毛刺最长约4.9cm，局部延伸至胸大肌前方，其内及周围见点状及模糊不定形钙化，密度增高且不均匀，周围腺体结构紊乱、纠集，血管影稍增多、增粗，邻近皮下脂肪层密度增高见条索影，皮肤略增厚。左乳内未见确切块影及恶性钙化，皮下脂肪层清晰，皮肤不厚。双乳头正常。双腋区淋巴结显示，密度稍高。\n1、右乳外上象限占位性病变，性质恶性，考虑乳腺癌，建议病检及MRI检查。BI-RADS5\n2、双乳腺增生症，建议定期复查。BI-RADS1[SEP][PAD][PAD][PAD][PAD][PAD]。。。。。[PAD][PAD][PAD]'

训练模型，度量损失

现在我们有了两个模态的编码器，同时可以准确的提取出来局部和全局特征
如何度量损失，进行模型约束是 最重要的问题

大路诗人路小果

关注

7
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
GLoRIA精读20240315 （中）

今天在使用网络的时候，发现生成注意力图很难，这个是因为中文医学词汇的数据集库分词，并不流行。但是GLoRIA给了我们一个新的思路，那就是使用与的方法。这也就是对比学的有点，医学图像特有的特点，使用局部有代表性的特征，就能给整个图片进行定义。这就是医学图像和自然图像的本质差别。论文的GLoRIA里面重点关注了注意力接着（上），进行代码实验。
复制链接

扫一扫