从零开始逐步指导开发者构建自己的大型语言模型(LLM)学习笔记- 第4章 GPT 的大语言模型架构 练习题

Chapter 4 Exercise solutions

In [1]:

from importlib.metadata import version

import torch
print("torch version:", version("torch"))
torch version: 2.4.0

Exercise 4.1: Parameters in the feed forward versus attention module

In [2]:

from gpt import TransformerBlock

GPT_CONFIG_124M = {
    "vocab_size": 50257,
    "context_length": 1024,
    "emb_dim": 768,
    "n_heads": 12,
    "n_layers": 12,
    "drop_rate": 0.1,
    "qkv_bias": False
}
block = TransformerBlock(GPT_CONFIG_124M)

  • The results above are for a single transformer block
  • 可以选择乘以 12 以涵盖 1.24 亿参数的 GPT 模型中的所有变压器模块。

Exercise 4.2: Initialize larger GPT models

  • GPT2-small (the 124M configuration we already implemented):

    • "emb_dim" = 768
    • "n_layers" = 12
    • "n_heads" = 12
  • GPT2-medium:

    • "emb_dim" = 1024
    • "n_layers" = 24
    • "n_heads" = 16
  • GPT2-large:

    • "emb_dim" = 1280
    • "n_layers" = 36
    • "n_heads" = 20
  • GPT2-XL:

    • "emb_dim" = 1600
    • "n_layers" = 48
    • "n_heads" = 25

In [5]:

GPT_CONFIG_124M = {
    "vocab_size": 50257,
    "context_length": 1024,
    "emb_dim": 768,
    "n_heads": 12,
    "n_layers": 12,
    "drop_rate": 0.1,
    "qkv_bias": False
}


def get_config(base_config, model_name="gpt2-small"):
    GPT_CONFIG = base_config.copy()

    if model_name == "gpt2-small":
        GPT_CONFIG["emb_dim"] = 768
        GPT_CONFIG["n_layers"] = 12
        GPT_CONFIG["n_heads"] = 12

    elif model_name == "gpt2-medium":
        GPT_CONFIG["emb_dim"] = 1024
        GPT_CONFIG["n_layers"] = 24
        GPT_CONFIG["n_heads"] = 16

    elif model_name == "gpt2-large":
        GPT
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值