LLM(2) | 什么是 LLM

baiyu33

已于 2024-02-02 19:07:36 修改

阅读量497

点赞数 3

文章标签： LLM 大语言模型

于 2024-02-02 18:38:50 首次发布

本文链接：https://blog.csdn.net/baiyu33/article/details/135999290

版权

LLM(2) | 什么是 LLM

本文是对 What is a large language model (LLM)? 的翻译。

文章目录

LLM(2) | 什么是 LLM

什么是大模型? (What is a large language model (LLM)?)

A large language model (LLM) is a type of artificial intelligence (AI) program that can recognize and generate text, among other tasks. LLMs are trained on huge sets of data — hence the name “large.” LLMs are built on machine learning: specifically, a type of neural network called a transformer model.

大语言模型(LLM, Large Language Model) 是一种人工智能程序, 它能够识别、生成文本，还可以执行其他任务。LLM是在大数据集上训练的，因此称为“大”（Large）。LLM是建立在机器学习上的，是一种被称为 transformer 模型的神经网络。

In simpler terms, an LLM is a computer program that has been fed enough examples to be able to recognize and interpret human language or other types of complex data. Many LLMs are trained on data that has been gathered from the Internet — thousands or millions of gigabytes’ worth of text. But the quality of the samples impacts how well LLMs will learn natural language, so an LLM’s programmers may use a more curated data set.

简单的说， LLM 是一个计算机程序，它被喂如了足够多的样本，能够识别、解释人类语言，或其他复杂类型的数据。许多LLM是从互联网上收集到的数据训练出来的：这些数据是从几千到几百万GB的文本。然而，样本的质量影响了LLM的好坏，因此 LLM 程序员使用筛选过的数据集。

LLMs use a type of machine learning called deep learning in order to understand how characters, words, and sentences function together. Deep learning involves the probabilistic analysis of unstructured data, which eventually enables the deep learning model to recognize distinctions between pieces of content without human intervention.

LLMs使用一种称为深度学习的机器学习类型，以便理解字符、单词和句子如何共同运作（这里function together是一起运作的意思）。深度学习涉及对非结构化数据的概率分析，最终使深度学习模型能够识别内容之间的区别(recognize distinctions between pieces of content)，无需人类干预。

LLMs are then further trained via tuning: they are fine-tuned or prompt-tuned to the particular task that the programmer wants them to do, such as interpreting questions and generating responses, or translating text from one language to another.

LLM 随后通过微调进一步训练：在程序员指定的任务上执行 fine-tune（微调）或 prompt-tune（提示调整）。任务举例：解释问题并生成回答，或者从一种语言翻译到另一种语言。

LLM 被用来做什么? (What are LLMs used for?)

LLMs can be trained to do a number of tasks. One of the most well-known uses is their application as generative AI: when given a prompt or asked a question, they can produce text in reply. The publicly available LLM ChatGPT, for instance, can generate essays, poems, and other textual forms in response to user inputs.

训练后的 LLM，可以做各种任务。最广为人知的一个应用是：生成式人工智能(generative AI): 当给定一个提示词（prompt），或问了一个问题，它们生成文本来作为回答。公共可用的 LLM，比如 ChatGPT，可以根据用户的数据，生成散文、诗歌，以及其他形式的文本。

Any large, complex data set can be used to train LLMs, including programming languages. Some LLMs can help programmers write code. They can write functions upon request — or, given some code as a starting point, they can finish writing a program. LLMs may also be used in:

任何大的、复杂的数据集，都可以被用来训练 LLM。一些 LLM 可以帮助程序员写代码。它们可以根据请求来写出函数，或者根据给定的代码作为起始，生成一个完整的程序。 LLM 也被用于：

情感分析 (Sentiment analysis)
DNA 研究 (DNA research)
客户服务 (Customer service)
聊天机器人 (Chatbots)
在线搜索 (Online search)

Examples of real-world LLMs include ChatGPT (from OpenAI), Bard (Google), Llama (Meta), and Bing Chat (Microsoft). GitHub’s Copilot is another example, but for coding instead of natural human language.

真实世界的 LLM 的例子，包括 ChatGPT （来自 OpenAI）, Bard（谷歌出品）, Llama (Meta出品)，以及必应聊天（微软出品）。 Github 的 Copilot 是另一个例子，不过它是用来写代码的，而不是人类自然语言的任务。

LLM 的优势和局限性是什么？ (What are some advantages and limitations of LLMs?)

A key characteristic of LLMs is their ability to respond to unpredictable queries. A traditional computer program receives commands in its accepted syntax, or from a certain set of inputs from the user. A video game has a finite set of buttons, an application has a finite set of things a user can click or type, and a programming language is composed of precise if/then statements.

LLM 具备对于不可预测的输入查询进行相应的能力，而以往的程序则只能对特定的、可预测的输入做相应：

传统的计算机程序接收其接受的语法中的命令，或者来自用户的某一组特定输入
视频游戏有有限的按钮集
应用程序有用户可以点击或输入的有限的事物集
编程语言由精确的if/then语句组成。

By contrast, an LLM can respond to natural human language and use data analysis to answer an unstructured question or prompt in a way that makes sense. Whereas a typical computer program would not recognize a prompt like “What are the four greatest funk bands in history?”, an LLM might reply with a list of four such bands, and a reasonably cogent defense of why they are the best.

相比之下， LLM 可以对人类自然语言做出相应，并使用数据分析来回答一个非结构化的问题（question）或提示词（prompt），并且结果是讲得通的（make sense，也就是给出解释）。而典型的计算机程序无法对于像“历史上最伟大的4个funk 乐队是谁？”这样的提示词做出相应，而 LLM 则可以给出一个包含了4个这样的乐队的答案，并给出理由为什么是这4个乐队。

In terms of the information they provide, however, LLMs can only be as reliable as the data they ingest. If fed false information, they will give false information in response to user queries. LLMs also sometimes “hallucinate”: they create fake information when they are unable to produce an accurate answer. For example, in 2022 news outlet Fast Company asked ChatGPT about the company Tesla’s previous financial quarter; while ChatGPT provided a coherent news article in response, much of the information within was invented.

然而，就信息的提供而言， LLM只能像摄入（消化）的数据一样可靠（意思是说训练数据的好坏，决定了能生成的好的还是坏的结果）。如果训练输入了假的信息，它们会对用户的查询给出错的信息。 LLM 有时候还有“幻觉”（hallucinate）：当他们无法生成一个准确的答案时，也会生成错误的信息。比如说，在2022年，新闻媒体 Fast Company 向 ChatGPT 询问特斯拉公司上一个财季的情况；虽然 ChatGPT 的回答挺连贯的，但是其中大部分信息是虚构的。

In terms of security, user-facing applications based on LLMs are as prone to bugs as any other application. LLMs can also be manipulated via malicious inputs to provide certain types of responses over others — including responses that are dangerous or unethical. Finally, one of the security problems with LLMs is that users may upload secure, confidential data into them in order to increase their own productivity. But LLMs use the inputs they receive to further train their models, and they are not designed to be secure vaults; they may expose confidential data in response to queries from other users.

从安全角度来说，基于 LLM 的面向用户的应用程序，和其他应用程序一样容易出现 bug。 LLM 也可以被恶意的输入操纵，进而提供特定类型的响应（而不是其他类型的响应），比如危险的、不道德的响应。最后， LLM 和安全相关的一个问题是，用户在提升个人生产力的同时，可能上传安全的、机密性质的数据，而 LLM 会把这些查询性质的安全、机密数据作为模型训练的输入，而它们并没有被设计为安全的 vaults； LLM 可能会把 A 用户的机密数据作为回答 B 用户的问题时的答案，造成了机密泄漏。

LLM 是如何工作的? (LLM shi How do LLMs work?)

机器学习和深度学习 (Machine learning and deep learning)

At a basic level, LLMs are built on machine learning. Machine learning is a subset of AI, and it refers to the practice of feeding a program large amounts of data in order to train the program how to identify features of that data without human intervention.

在一个基本的层面上， LLM 是基于机器学习的。机器学习是人工智能的一个子集，它指的是给一个程序喂入大量数据来训练这个程序，让它在没有人类干预情况下能够区分特征。

LLMs use a type of machine learning called deep learning. Deep learning models can essentially train themselves to recognize distinctions without human intervention, although some human fine-tuning is typically necessary.

LLM 使用的是深度学习，它是机器学习中的一种。深度学习模型能再没有人类干预的情况下自行训练，尽管通常来说一些人类的微调是必要。

Deep learning uses probability in order to “learn.” For instance, in the sentence “The quick brown fox jumped over the lazy dog,” the letters “e” and “o” are the most common, appearing four times each. From this, a deep learning model could conclude (correctly) that these characters are among the most likely to appear in English-language text.

深度学习使用概率来学习。比如说这个句子 “The quick brown fox jumped over the lazy dog,”, 字母 “e” 和 “o” 是最常见的，都出现了4次。对此，深度学习模型能够得出（正确的）结论：这两个字母是英语文本中最可能出现的字母。

Realistically, a deep learning model cannot actually conclude anything from a single sentence. But after analyzing trillions of sentences, it could learn enough to predict how to logically finish an incomplete sentence, or even generate its own sentences.

实际上，一个深度学习模型实际上无法从单个句子得出任何的结论。但是再分析了上十亿的文本后，它能够被训练的足以预测 “如何有逻辑的补全一个句子，甚至生成一整个句子”。

神经网络 (Neural networks)

In order to enable this type of deep learning, LLMs are built on neural networks. Just as the human brain is constructed of neurons that connect and send signals to each other, an artificial neural network (typically shortened to “neural network”) is constructed of network nodes that connect with each other. They are composed of several "layers”: an input layer, an output layer, and one or more layers in between. The layers only pass information to each other if their own outputs cross a certain threshold.

为了让这种深度学习成为可能，LLM 是在神经网络之上构建的。就像人类大脑是构建于相互连接、相互发送信号的神经元的基础上一样，一个人工神经网络（通常简称为神经网络）是建立在相互连接的网络节点（network node）上的。他们由多个“层”（layers）组成：一个输入层，一个输出层，以及中间的一层或多层。每一层只有当输出超过阈值的时候才会输入给到其他层。

transformer 模型 (Transformer models)

The specific kind of neural networks used for LLMs are called transformer models. Transformer models are able to learn context — especially important for human language, which is highly context-dependent. Transformer models use a mathematical technique called self-attention to detect subtle ways that elements in a sequence relate to each other. This makes them better at understanding context than other types of machine learning. It enables them to understand, for instance, how the end of a sentence connects to the beginning, and how the sentences in a paragraph relate to each other.

具体到 LLM，它用的是一种称为 transformer 的模型。 transformer 模型能够学习上下文，尤其是对于人类语言，因为人类语言是上下文相互依赖的。 transformer 模型使用一种叫做自监督（self-attention）的数学技术，来检测序列中元素之间相互关联的微妙方式。这使得他们比其他类型的机器学习更擅长理解上下文。self-attention 使得 LLM 能够理解诸如 “一个句子的结束如何和开头相连接”，以及 “一个段落中的句子如何和其他关联”。

This enables LLMs to interpret human language, even when that language is vague or poorly defined, arranged in combinations they have not encountered before, or contextualized in new ways. On some level they “understand” semantics in that they can associate words and concepts by their meaning, having seen them grouped together in that way millions or billions of times.

这使得 LLM 能够解释人类语言，即使给 LLM 输入的文本是含糊不清的（vague），或定义得不清晰，或排列方式前所未见。在某种程度上，它们能够 “理解” 语义，因为它们能够通过意义将 “单词” 和 “概念” 联系起来，已经看到它们以这种方式成组出现了数百万次甚至数十亿次。

开发者如何快速开始构建他们自己的 LLM ? (How developers can quickly start building their own LLMs?)

To build LLM applications, developers need easy access to multiple data sets, and they need places for those data sets to live. Both cloud storage and on-premises storage for these purposes may involve infrastructure investments outside the reach of developers’ budgets. Additionally, training data sets are typically stored in multiple places, but moving that data to a central location may result in massive egress fees.

为了构建 LLM 应用，开发者需要轻松访多个数据集，并且需要为这些数据集提供存放的地方。无论是云存储还是本地存储，为这些目的而投资可能超出开发者预算的范围。此外，训练数据集通常存储在多个地方，但是将数据移动到集中的位置可能会导致巨额的出站费用。

Fortunately, Cloudflare offers several services to allow developers to quickly start spinning up LLM applications, and other types of AI. Vectorize is a globally distributed vector database for querying data stored in no-egress-fee object storage (R2) or documents stored in Workers Key Value. Combined with the development platform Cloudflare Workers AI, developers can use Cloudflare to quickly start experimenting with their own LLMs.

幸运的是， Cloudflare 提供了多项服务，让开发者能够快速启动 LLM 应用程序和其他类型的人工智能。 Vectorize 是一个全球分布式的向量数据库，用于查询存储在无出口费用对象存储（R2）或存储在Workers Key Value中的数据。结合开发平台Cloudflare Workers AI, 开发者们可以利用 Cloudflare 快速开始尝试它们自己的 LLMs。

个人总结

翻译完了，贴一下个人觉得重点的东西：

LLM 既能预测输出的文本，也能生成文本，生成回答这是比较智能的地方
LLM 能接受没见过的输入，能处理含糊的、组织的不那么合理的问题
LLM 是一种 transformer，而自监督 (self-attention) 是则是 transformer 的基础，它使得上下文关联称为了可能
向量数据库, 如 Vectorise, 会成为 LLM 的重要环节

baiyu33

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
LLM(2) | 什么是 LLM

大语言模型(LLM, Large Language Model) 是一种人工智能程序, 它能够识别、生成文本，还可以执行其他任务。LLM是在大数据集上训练的，因此称为“大”（Large）。LLM是建立在机器学习上的，是一种被称为 transformer 模型的神经网络。简单的说， LLM 是一个计算机程序，它被喂如了足够多的样本，能够识别、解释人类语言，或其他复杂类型的数据。许多LLM是从互联网上收集到的数据训练出来的：这些数据是从几千到几百万GB的文本。
复制链接

扫一扫