OPT 大语言模型（Large Language Model）结构

黎明沐白

已于 2024-07-28 17:02:51 修改

阅读量471

点赞数 3

分类专栏：人工智能文章标签： transformer 语言模型深度学习人工智能

于 2024-07-28 16:52:44 首次发布

本文链接：https://blog.csdn.net/qq_42047140/article/details/140752439

版权

人工智能专栏收录该内容

5 篇文章 0 订阅

订阅专栏

OPT 大语言模型（Large Language Model）结构

大语言模型follow GPT的做法，其基本组成结构是Decoder-only的Transformer block，多个Transformer Block堆叠在一起；

不同数量、不同Head、不同隐藏层维度构成了不同参数量的大模型（也即模型跟着的后缀，比如，6.7B）；

OPT是由Facebook（现称Meta）公式开源的大语言模型；

以OPT-6.7b模型为例，梳理OPT大模型的网络结构；

OPTConfig {
  "_name_or_path": "facebook/opt-6.7b",
  "_remove_final_layer_norm": false,
  "activation_dropout": 0.0,
  "activation_function": "relu",
  "architectures": [
    "OPTForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 2,
  "do_layer_norm_before": true,
  "dropout": 0.1,
  "enable_bias": true,
  "eos_token_id": 2,
  "ffn_dim": 16384,
  "hidden_size": 4096,
  "init_std": 0.02,
  "layer_norm_elementwise_affine": true,
  "layerdrop": 0.0,
  "max_position_embeddings": 2048,
  "model_type": "opt",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "pad_token_id": 1,
  "prefix": "</s>",
  "torch_dtype": "float16",
  "transformers_version": "4.42.2",
  "use_cache": true,
  "vocab_size": 50272,
  "word_embed_proj_dim": 4096
}

上面代码为OPT-6.7b模型的配置文件，里面列出了大模型的相关参数，可以重点关注的包括：

activation_function: relu 采用的激活函数为 ReLU
vocab_size: 50272 词表大小
word_embed_proj_dim: 4096 经过embedding后的Token向量的维度
ffn_dim: 16384 Transformer Block中 MLP 中FC层的隐层维度
hidden_size: 4096 隐层维度（一般与word_embed_proj_dim相同）
num_attention_heads: 32 Attention Head的数量
num_hidden_layers: 32 Transformer Block的数量
torch_dtype: float16 预训练模型参数的数据类型（大模型的参数一般都为半精度fp16，而非单精度浮点数fp32）

Pytorch中定义完Model后，print(model)可以看到网络结构，打印结果如下：

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 4096, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 4096)
      (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-31): 32 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear(in_features=4096, out_features=4096, bias=True)
            (v_proj): Linear(in_features=4096, out_features=4096, bias=True)
            (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
            (out_proj): Linear(in_features=4096, out_features=4096, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear(in_features=4096, out_features=16384, bias=True)
          (fc2): Linear(in_features=16384, out_features=4096, bias=True)
          (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        )
      )
    )
  )
  (lm_head): Linear(in_features=4096, out_features=50272, bias=False)
)