自然语言处理知识点整理

最新推荐文章于 2024-10-05 11:46:41 发布

550A

最新推荐文章于 2024-10-05 11:46:41 发布

阅读量1.3k

点赞数 22

文章标签：自然语言处理人工智能

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/m0_65079225/article/details/141439601

版权

1、Self-attention（自注意力）和cross-attention（交叉注意力）

❖ Self-Attention:同一序列不同位置设置不同注意力权重，权重建模与其他位置关系

❖ Role: Self-attention allows the model to assign different attention weights to different positions within the input sequence, enabling the model to capture dependencies within the input sequence beyond just relative positions.

❖ Operation: For each input position, the self-attention mechanism computes a weighted sum, where the weights are obtained by modeling the relationship between the current position and other positions in the sequence.

意义：This allows the model to focus on information relevant to the current position within the input sequence.

Cross-Attention:处理两个不同序列，权重建模当前输出和输入所有位置的关系

❖ Role: Cross-attention is used to handle relationships between two different sequences, typically between an input sequence and an output sequence. For example, in machine translation, where the input is a source language sequence and the output is a target language sequence, cross-attention enables the model to attend to different parts of the source language sequence while generating the target language sequence.

❖ Operation: For each output position, the cross-attention mechanism computes a weighted sum, where the weights are obtained by modeling the relationship between the current output position and all positions in the input sequence.

意义：This allows the model to integrate information from different positions within the input sequence when generating the output sequence.

2、Transformer:

❖ 并⾏计算：Transformer 模型通过⾃注意⼒机制（self-attention）实现了并⾏计算，使得模型能够更有效地处理⻓序列。相⽐于循环神经⽹络（RNN）等序列模型，Transformer 的并⾏计算使得训练和推断速度更快。

❖ 远距离依赖：由于 self-attention 机制的存在，Transformer 能够捕捉序列中任意两个位置之间的依赖关系，⽽不受序列⻓度的限制。这使得模型能够更好地处理⻓距离依赖，有助于捕捉⽂本中的⻓程语境。

❖ 灵活性： Transformer 模型的结构使其更加灵活，可以应⽤于多种不同的任务，如⽂本翻译、⽂本⽣成、⽂本分类等。它的通⽤性使得在许多领域都能够取得良好的效果。

❖ 注意⼒机制：⾃注意⼒机制允许模型对输⼊的不同部分分配不同的注意⼒权重，使得模型能够更好地关注与当前任务相关的信息。这使得Transformer 在处理复杂的语⾔结构和语义关系时表现出⾊。

❖ 可解释性：由于注意⼒权重的可解释性，Transformer 模型在⼀定程度上具有解释性。模型的预测可以通过注意⼒权重来解释，使得⽤户能够更好地理解模型的决策过程。

❖ 更好的性能：在⼤规模数据集和⾜够的计算资源下，Transformer 模型通常能够取得优越的性能。这在很⼤程度上归功于其能够学习到更复杂的表示和更丰富的语义信息。

3、NLP模型发展阶段

统计语言模型、神经网络语言模型、预训练语言模型、大规模语言模型

里程碑：

transformer:引入注意力机制

bert：encoder-only

小模型与大模型的区别：

更大数据集性能迭代

多了持续与训练、监督微调、通过人类反馈进行强化学习

4、CHATGPT关键技术

注意力机制、预训练语言模型、代码训练、指令微调、RLHF

训练四阶段：1、文字接龙

2、人类老师引导文字接龙的方向

3、模仿人类老师的喜好

4、用增强式学习向模拟老师学习

面临挑战：

1、训练算力增长速度超越芯片摩尔定律

2、参数量进入平台期

3、训练预料即将耗尽

4、知识遗忘问题

5、安全性、伦理道德

关注

22
点赞
踩
28

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

550A CSDN认证博客专家 CSDN认证企业博客

码龄3年

3: 原创

131万+: 周排名

12万+: 总排名

4386: 访问

: 等级

110: 积分

73: 粉丝

80: 获赞

1: 评论

87: 收藏

私信

关注

热门文章

最新评论

计算机视觉知识点整理
CSDN-Ada助手: 恭喜您开启博客创作的旅程！您的标题“计算机视觉知识点整理”听起来非常有趣和有用。计算机视觉是一个非常广阔的领域，整理知识点是一个很好的起点。在下一步的创作中，或许您可以考虑进一步深入探索其中的某个知识点，或者分享一些实践经验和案例研究，这样读者们将能更好地理解和应用这些知识点。期待您的新作品！推荐【每天值得看】：https://bbs.csdn.net/forums/csdnnews?typeId=21804&utm_source=csdn_ai_ada_blog_reply1

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。