Chat-GPT原理

笑口常开的小丸子

已于 2023-12-05 17:45:06 修改

阅读量2.7k

点赞数 3

分类专栏：计算机网络文章标签： gpt

于 2023-12-02 15:09:17 首次发布

本文链接：https://blog.csdn.net/weixin_46190208/article/details/134752623

版权

计算机网络专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Chat-GPT原理核心:基于Transformer 架构

以下是参考文献的部分截图原文说明：

Transformers are based on the “attention mechanism,” which allows the model to pay more attention to some inputs than others, regardless of where they show up in the input sequence. For example, let’s consider the following sentence:

在这里插入图片描述

In this scenario, when the model is predicting the verb “bought,” it needs to match the past tense of the verb “went.” In order to do that, it has to pay a lot of attention to the token “went.” In fact, it may pay more attention to the token “went” than to the token “and,” despite the fact that “went” appears much earlier in the input sequence.

原文简单总结：Transformer 架构它允许模型在处理输入序列时，能够同时关注输入序列中各个位置的信息，从而更好地捕捉长距离依赖关系。

Transformer 架构：

特点包括自注意力机制和位置编码，它们使得模型能够有效地捕捉输入序列的长程依赖关系。下面是 Transformer 架构的一些关键组成部分：

自注意力机制（Self-attention）：

自注意力机制允许模型在处理序列数据时将不同位置的信息进行交互。通过对每个单词或标记计算注意力权重，模型可以根据输入序列中其他位置的信息来调整每个位置的表示。这使得模型能够捕获远距离的依赖关系，从而更好地理解整个序列。

位置编码（Positional encoding）：

由于自注意力机制并不会考虑输入序列中词语的位置信息，因此需要引入位置编码来表示词语在序列中的相对位置。常用的位置编码方法包括正弦和余弦函数的组合，这样可以为不同位置的词语赋予不同的位置编码向量。

编码器-解码器结构（Encoder-Decoder architecture）：

Transformer 模型通常由编码器和解码器组成，适用于序列到序列的任务，如机器翻译。编码器用于处理输入序列，解码器用于生成输出序列。

多头注意力（Multi-head attention）：

为了增加模型对不同表示空间的关注，Transformer 使用多个注意力头来并行计算注意力权重，然后将它们的结果进行拼接和线性变换。

前馈神经网络（Feed-forward neural network）：

每个编码器和解码器层都包含一个前馈神经网络，它将每个位置的表示映射为另一个表示，通过多层前馈神经网络可以增加模型的表示能力。

除此之外，Transformer架构还使用了残差连接（residual connections）和层归一化（layer normalization）等技术来加速训练过程和提高模型性能。此外，Transformer架构还支持并行计算，使得模型能够更高效地处理大规模数据。

原文链接建议多读读：How GPT Models Work. Learn the core concepts behind OpenAI’s… | by Beatriz Stollnitz | Towards Data Science

笑口常开的小丸子

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
3
评论
Chat-GPT原理

除了自注意力机制外，Transformer架构还使用了残差连接（residual connections）和层归一化（layer normalization）等技术来加速训练过程和提高模型性能。 Transformer架构通常由编码器（encoder）和解码器（decoder）组成，其中编码器用于将输入序列映射为一系列隐藏表示，解码器则利用这些隐藏表示生成输出序列。在自注意力子层中，输入序列中的每个元素都可以与其他元素进行交互，通过学习注意力权重来确定不同位置之间的关联程度。
复制链接

扫一扫