Llama 2 Powered By ONNX

Yongqiang Cheng

于 2025-01-23 01:14:31 发布

阅读量730

点赞数 4

分类专栏： Large Language Model (LLM) 文章标签： Llama 2 ONNX

世上没有白读的书，每一页都算数。

本文链接：https://blog.csdn.net/chengyq116/article/details/145313484

版权

Large Language Model (LLM) 专栏收录该内容

15 篇文章

订阅专栏

Llama 2 Powered By ONNX

1. Llama 2
- 1.1. The structure of Llama 2
References

https://github.com/microsoft/Llama-2-Onnx

1. Llama 2

Llama 2 is a collection of pretrained and fine-tuned generative text models.

1.1. The structure of Llama 2

Llama 2 model consists of a stack of decoder layers. Each decoder layer (or transformer block) is constructed from one self-attention layer and one feed-forward multi-layer perceptron.

Llama models use different projection sizes compared with classic transformers in the feed-forward layer, for instance, both Llama 1 and Llama 2 projection use 2.7x hidden size rather than the standard 4x hidden size.

A key difference between Llama 1 and Llama 2 is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency.

在这里插入图片描述
Llama2 Model