GLoRIA精读20240315 (中)

在这里插入图片描述

GLoRIA精读20240315 (中)

今天在使用 R2GENCMN 网络的时候,发现生成注意力图很难,这个是因为中文医学词汇的数据集库分词,并不流行。但是GLoRIA给了我们一个新的思路,那就是使用 局部词汇局部图像 的方法。这也就是对比学的有点,医学图像特有的特点,使用局部有代表性的特征,就能给整个图片进行定义。这就是医学图像和自然图像的本质差别。

论文的GLoRIA里面重点关注了注意力

接着(上),进行代码实验

1. 模型的整体部分

    def forward(self, x):
        img_emb_l, img_emb_g = self.image_encoder_forward(x["imgs"])
        text_emb_l, text_emb_g, sents = self.text_encoder_forward(
            x["caption_ids"], x["attention_mask"], x["token_type_ids"])
        
        return img_emb_l, img_emb_g, text_emb_l, text_emb_g, sents

2. 视觉编码部分(img_encoder)+(generate_embeddings)

2.1 img_encoder

def image_encoder_forward(self, imgs):
    img_feat_g, img_emb_l = self.img_encoder(imgs, get_local=True)
    img_emb_g, img_emb_l = self.img_encoder.generate_embeddings(
        img_feat_g, img_emb_l
    )
    return img_emb_l, img_emb_g
output=self.img_encoder(torch.zeros([1,3,224,224]),True)

In [27]:  output[0].size()
Out[27]: torch.Size([1, 2048]) #是特征向量

In [26]:  output[1].size()
Out[26]: torch.Size([1, 1024, 19, 19]) #是特征图

2.2 generate_embeddings

(global_embedder): Linear(in_features=2048, out_features=768, bias=True)
(local_embedder): Conv2d(1024, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
In [13]:  output[1].size()
Out[13]: torch.Size([1, 768])

In [12]: output[0].size()
Out[12]: torch.Size([1, 768, 19, 19])

3. 文本编码器

https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT/tree/main

使用的特定的医学分词,tokenizer不适合中文,和中文医学词汇
tokenizer,使用 [101] 作为开始,[102] 作为结束

In [5]:  self.text_encoder.tokenizer("")
Out[5]: {'input_ids': [101, 102], 'token_type_ids': [0, 0], 'attention_mask': [1, 1]}

文本编码的模型结构,有一个

(embedding)+ 12*(layer)

In [35]:  self.model
Out[35]: 
BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): BertOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): BertPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
)

模型的输出是,是提取 每一层的隐藏层的具体特征 ,这样就像resnet网络一样,在模型 浅层 可以提取到 局部特征
但这个是一个强加的理解,实际上这个encoder的过程就是将batch整理干净,长度pad到一样,然后提取出来局部和整体。这里面的def aggregate_tokens(self, embeddings, caption_ids):这个函数看着很复杂,其实就是干的这样一个简单的事情。

In [2]:  input=self.tokenizer("this is a cancner")
In [3]:  input.keys()
In [29]: input['input_ids'].size()
Out[29]: torch.Size([1, 8])

Out[3]: dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
In [13]:  for name in input.keys():
    ...:      input[name]=torch.tensor(input[name]).unsqeeze(0)

In [18]:  output=self.model(**input)
In [19]:  output.keys()
Out[19]: odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states'])

In [27]:  print(hidden_states[0].size())
torch.Size([1, 8, 768])

In [32]:  len(hidden_states)
Out[32]: 13

对于全局和局部
全局就是数学的加或者平均

 if self.aggregate_method == "sum":
     word_embeddings = embeddings.sum(axis=1)
     sent_embeddings = sent_embeddings.sum(axis=1)
 elif self.aggregate_method == "mean":
     word_embeddings = embeddings.mean(axis=1)
     sent_embeddings = sent_embeddings.mean(axis=1)
 else:
     print(self.aggregate_method)
     raise Exception("Aggregation method not implemented")

编码输出的特征形状

In [8]:  x["caption_ids"].size()
Out[8]: torch.Size([5, 512])

In [9]:  x["attention_mask"].size()
Out[9]: torch.Size([5, 512])

In [10]:  x["token_type_ids"].size()
Out[10]: torch.Size([5, 512])

 text_emb_l, text_emb_g, sents = self.text_encoder_forward(
            x["caption_ids"], x["attention_mask"], x["token_type_ids"]
        )
In [1]:  text_emb_l.size()
Out[1]: torch.Size([5, 768, 512])

In [2]:  text_emb_g.size()
Out[2]: torch.Size([5, 768])

In [3]:  type(sents)
Out[3]: list

In [4]:  len(sents)
Out[4]: 5

In [5]:  type(sents[0])
Out[5]: list

In [6]:  len(sents[0])
Out[6]: 512

In [7]:  "".join(sents[0])
Out[7]: '[CLS]双乳腺呈不均匀致密型,前缘凹凸不平,密度不均匀,见片状密度增高影。右乳外上象限见一不规则肿块,大小约1.5cm×1.0cm×1.4cm,边缘毛糙见长短不一毛刺影,毛刺最长约4.9cm,局部延伸至胸大肌前方,其内及周围见点状及模糊不定形钙化,密度增高且不均匀,周围腺体结构紊乱、纠集,血管影稍增多、增粗,邻近皮下脂肪层密度增高见条索影,皮肤略增厚。左乳内未见确切块影及恶性钙化,皮下脂肪层清晰,皮肤不厚。双乳头正常。双腋区淋巴结显示,密度稍高。\n1、右乳外上象限占位性病变,性质恶性,考虑乳腺癌,建议病检及MRI检查。BI-RADS5\n2、双乳腺增生症,建议定期复查。BI-RADS1[SEP][PAD][PAD][PAD][PAD][PAD]。。。。。[PAD][PAD][PAD]'

训练模型,度量损失

现在我们有了两个模态的编码器,同时可以准确的提取出来 局部全局 特征
如何度量损失,进行模型约束是 最重要的问题

  • 7
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
import panel as pn # GUI pn.extension() panels = [] # collect display context = [ {'role':'system', 'content':""" You are OrderBot, an automated service that collects orders for GLORIA. \ You greet customers first, then collect orders,\ Then ask whether to buy in store or online mall. \ You wait to collect the entire order, then aggregate it and check the final \ If the client wants to add anything else, it will take time. \ For delivery, you need an address. \ Finally you get paid. \ Make sure to clearly state all options, where to wear them, sizes, colors to uniquely\ Identify items from the form. \ Sizes include: s, m, l, xl. \ Fabrics are: wool, cotton and linen, chiffon. \ Recommend different fabrics to customers according to their characteristics,\ Wool is recommended if you want to be thick and warm,\ If you want to be comfortable and skin-friendly, cotton and linen are recommended,\ If you want elegant and bright colors, chiffon is recommended. \ Ask the customer what color they want. \ Remember the customer's preferences when recommending, \ and make recommendations based on their mentioned preferences.\ Inquire about the size at the end when the customer wants to try it on.\ You respond with short, very friendly conversation. \ The form includes \ There are two kinds of dresses Dress A: blue, pink. The price is 100. Dress B: blue, pink. The price is 110. Set: There are two Set A: blue, pink. The price is 120. Set B: white, black. The price is 130. """} ] # accumulate messages inp = pn.widgets.TextInput(value="Hi", placeholder='Enter text here…') button_conversation = pn.widgets.Button(name="Chat!") interactive_conversation = pn.bind(collect_messages, button_conversation) dashboard = pn.Column( inp, pn.Row(button_conversation), pn.panel(interactive_conversation, loading_indicator=True, height=300), ) dashboard是什么意思
05-12

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值