给同学讲 Qwen2 大模型源码的记录

最新推荐文章于 2025-04-06 19:39:40 发布

木尧大兄弟

最新推荐文章于 2025-04-06 19:39:40 发布

阅读量4.7k

点赞数 37

文章标签：大模型

本文链接：https://blog.csdn.net/muyao987/article/details/137834838

版权

好久没接触大模型的老同学要做一个PPT分享大模型技术进展，然后来一起以 Qwen2 源码为例子探讨了一下大模型的一些技术细节。

1. Qwen2 模型结构是啥样的？

直接 load 然后打印一下 model。

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(152064, 5120)
    (layers): ModuleList(
      (0-63): 64 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): Linear(in_features=5120, out_features=5120, bias=True)
          (k_proj): Linear(in_features=5120, out_features=1024, bias=True)
          (v_proj): Linear(in_features=5120, out_features=1024, bias=True)
          (o_proj): Linear(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=5120, out_features