bert简介_BERT简介

最新推荐文章于 2024-08-16 13:29:44 发布

weixin_26752765

最新推荐文章于 2024-08-16 13:29:44 发布

阅读量893

点赞数

文章标签： python

原文链接：https://medium.com/analytics-vidhya/introduction-to-bert-f9aa4075cf4f

版权

这篇博客简要介绍了BERT模型，包括其核心概念和应用。

摘要由CSDN通过智能技术生成

bert简介

BERT, Bi-directional Encoder Representation from Transformer, is a state of the art language model by Google which can be used for cutting-edge natural language processing (NLP) tasks.

BERT是Transformer的双向编码器表示形式，是Google先进的语言模型，可用于尖端的自然语言处理(NLP)任务。

After reading this article, you will have a basic understanding of BERT and will be able to utilize it for your own business applications. It would be helpful if you are familiar with Python and have a general idea of machine learning.

阅读本文之后，您将对BERT有基本的了解，并将能够将其用于您自己的业务应用程序。如果您熟悉Python并且对机器学习有所了解，这将很有帮助。

The BERT models I will cover in this article are:

我将在本文中介绍的BERT模型是：

Binary or multi-class classification
二进制或多类分类
Regression model
回归模型
Question-answering applications
问答应用

Introduction to BERT

BERT简介

BERT is trained on the entirety of Wikipedia (~2.5 billion words), along with a book corpus (~800 million words). In order to utilize BERT, you won’t have to repeat this compute-intensive process.

BERT接受了整个Wikipedia(约25亿个单词)以及书籍语料库(约8亿个单词)的培训。为了利用BERT，您不必重复此计算密集型过程。

BERT brings the transfer learning approach into the natural language processing area in a way that no language model has done before.

BERT以前所未有的语言模型将迁移学习方法引入自然语言处理领域。

Transfer Learning

转移学习

Transfer learning is a process where a machine learning model developed for a general task can be reused as a starting point for a specific business problem.

转移学习是一个过程，在该过程中，可以将为一般任务开发的机器学习模型重新用作特定业务问题的起点。

Imagine you want to teach someone named Amanda, who doesn’t speak English, how to take the SAT. The first step would be to teach Amanda the English language as thoroughly as possible. Then, you can teach her more specifically for the SAT.

想象一下，您想教一个不会说英语的名叫Amanda的人参加SAT考试。第一步是尽可能全面地教阿曼达英语。然后，您可以针对SAT更具体地教她。

In the context of a machine learning model, this idea is known as transfer learning. The first part of transfer learning is pre-training (similar to teaching Amanda English for the first time). After the pre-training is complete you can focus on a specific task (like teaching Amanda how to take the SAT). This is a process known as fine-tuning — changing the model so it can fit your specific business problem.

在机器学习模型的上下文中，这个想法称为转移学习。转移学习的第一部分是预培训(类似于第一次教阿曼达英语)。预培训完成后，您可以专注于特定任务(例如教阿曼达(Amanda)如何参加SAT)。这是一个称为微调的过程-更改模型以使其适合您的特定业务问题。

BERT Pre-training

BERT预训练

This is a quick introduction about the BERT pre-training process. For practical purposes, you can use a pre-trained BERT model and do not need to perform this step.

这是有关BERT预训练过程的快速介绍。出于实际目的，您可以使用预训练的BERT模型，而无需执行此步骤。

BERT takes two chunks of text as input. In the simplified example above, I referred to these two inputs as Sentence 1 and Sentence 2. In the pre-training for BERT, Sentence 2 intentionally does not follow Sentence 1 in about half of the training examples.

BERT将两个文本块作为输入。在上面的简化示例中，我将这两个输入称为句子1和句子2。在BERT的预训练中，在大约一半的训练示例中，句子2故意不遵循句子1。

Sentence 1 starts with a special token [CLS] and both sentences end with another special token [SEP]. There will be a single token for each word that is in the BERT vocabulary. If a word is not in the vocabulary, BERT will split that word into multiple tokens. Before feeding sentences to BERT, 15% of the tokens are masked.

句子1以特殊标记[CLS]开头，两个句子都以另一个特殊标记[SEP]结尾。 BERT词汇表中的每个单词都有一个令牌。如果单词不在词汇表中，则BERT会将单词拆分为多个标记。在将句子提供给BERT之前，将屏蔽15％的令牌。

The pre-training process, the first step of transfer learning, is like teaching English to the BERT model so that it can be used for various tasks which require English knowledge. This is accomplished by the two practice tasks given to BERT:

预培训过程是迁移学习的第一步，就像在BERT模型上教英语一样，它可以用于需要英语知识的各种任务。这是通过给BERT的两个练习任务完成的：

Predict masked (hidden) tokens. To illustrate, the words “favorite” and “to” are masked in the diagram above. BERT will try to predict these masked tokens as part of the pre-training. This is similar to a “fill in the blanks” task we may give to a student who is learning English. While trying to fill in the missing words, the student will learn the language. This is referred to as the Masked Language Model (MLM).
预测屏蔽(隐藏)令牌。为了说明起见，在上图中屏蔽了单词“收藏夹”和“收件人”。 BERT将在预训练中尝试预测这些被屏蔽的令牌。这类似于我们可能给予正在学习英语的学生的“填补空白”任务。在尝试填写缺失的单词时，学生将学习该语言。这被称为屏蔽语言模型(MLM)。
BERT also tries to predict if Sentence 2 logically follows Sentence 1 or not in order to provide a deeper understanding about sentence dependencies. In the example above, Sentence 2 is in logical continuation of Sentence 1, so the prediction will be True. The special token [CLS] on the output side is used for this task.
BERT还尝试预测句子2在逻辑上是否跟随句子1，以提供对句子依存关系的更深入理解。在上面的示例中，句子2是句子1的逻辑延续，因此预测将为True。输出端的特殊令牌[CLS]用于此任务。

The BERT pre-trained model comes in many variants. The most common ones are BERT Large and BERT Base:

BERT预训练模型有许多变体。最常见的是BERT Large和BERT Base：

BERT Fine-Tuning

BERT微调

Fine-tuning is the next part of transfer learning. For specific tasks, such as text classification or question-answering, you would perform incremental training on a much smaller dataset. This adjusts the parameters of the pre-trained model.

微调是迁移学习的下一部分。对于特定任务，例如文本分类或问题解答，您将在较小的数据集上进行增量训练。这将调整预训练模型的参数。

用例 (Use Cases)

To demonstrate practical uses of BERT, I am providing two examples below. The code and documentation are provided in both GitHub and Google Colab. You can use either of the options to follow along and try it out for yourself!

为了演示BERT的实际用法，我在下面提供两个示例。 GitHub和Google Colab中都提供了代码和文档。您可以使用以下任何一种方法来自己尝试一下！

Text Classification or Regression
文字分类或回归

This is sample code for the binary classification of tweets. Here we have two types of tweets, disaster-related tweets (target = 1) and normal tweets (target = 0). We fine-tune the BERT Base model to classify tweets into these two groups.

这是推文的二进制分类的示例代码。在这里，我们有两种类型的推文，与灾难有关的推文(目标= 1)和普通推文(目标= 0)。我们对BERT Base模型进行微调，以将推文分为这两类。

GitHub: https://github.com/sanigam/BERT_Medium

GitHub： https : //github.com/sanigam/BERT_Medium

Google Colab: https://colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ARH9dnugVuKjRTNorKIVrgRKitjg051c ？ usp = sharing

This code can be used for multi-class classification or regression by using appropriate values of parameters in the function bert_model_creation(). The code provides details on parameter values. If you want, you can add additional dense layers into this function.

通过在函数bert_model_creation()中使用适当的参数值，此代码可用于多类分类或回归。该代码提供了有关参数值的详细信息。如果需要，可以在此功能中添加其他密集层。

2. BERT for Question-Answering

2. BERT进行问题解答

This is another interesting use case for BERT, where you input a passage and a question into the BERT model. It can find the answer to the question based on information given in the passage. In this code, I am using the BERT Large model, which is already fine-tuned on the Stanford Question Answer Dataset (SQuAD). You will see how to use this fine-tuned model to get answers from a given passage.

这个是BERT的另一个有趣用例，您在BERT模型中输入了段落和问题。它可以根据段落中给出的信息找到问题的答案。在此代码中，我使用的是BERT Large模型，该模型已经在Stanford问题答案数据集(SQuAD)上进行了微调。您将看到如何使用此微调的模型从给定的段落中获得答案。

GitHub: https://github.com/sanigam/BERT_QA_Medium

GitHub： https : //github.com/sanigam/BERT_QA_Medium

Google Colab: https://colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK?usp=sharing

Google Colab： https ：//colab.research.google.com/drive/1ZpeVygQJW3O2Olg1kZuLnybxZMV1GpKK ？ usp = sharing

Example with this use case:

此用例示例：

Passage — “John is a 10 year old boy. He is the son of Robert Smith. Elizabeth Davis is Robert’s wife. She teaches at UC Berkeley. Sophia Smith is Elizabeth’s daughter. She studies at UC Davis”

段落— “约翰是个10岁的男孩。 他是罗伯特·史密斯(Robert Smith)的儿子。 伊丽莎白·戴维斯(Elizabeth Davis)是罗伯特(Robert)的妻子。 她在加州大学伯克利分校任教。 索菲亚·史密斯(Sophia Smith)是伊丽莎白的女儿。 她在加州大学戴维斯分校学习”

Question — “Which college does John’s sister attend?”

问题— “约翰的姐姐上哪一所大学？”

When these two inputs are passed in, the model returns the correct answer, “uc davis”

传入这两个输入后，模型将返回正确的答案“ uc davis”

This example proves that BERT can understand language structure and handle dependencies across sentences. It can apply simple logic to answer the question (e.g. to find out who John’s sister is). Please note that you can have a passage that is much longer than the example shown above, but the total length of the question and passage cannot exceed 512 tokens. If your passage is longer than that, the code will automatically truncate the extra part.

该示例证明BERT可以理解语言结构并处理句子之间的依存关系。它可以应用简单的逻辑来回答问题(例如，找出约翰的姐姐是谁)。请注意，您可以通过的段落比上面显示的示例长得多，但是问题和段落的总长度不能超过512个记号。如果您的通过时间超过该时间，则代码将自动截断多余的部分。

The code provides examples in addition to the one shown above— a total of 3 passages and 22 questions. One of these passages is a version of my BERT article. You will see that BERT QA is able to answer any question where it can get answer from the passage. You can customize the code for your own question-answering applications.

除了上面显示的示例外，该代码还提供了示例-共有3个段落和22个问题。这些文章之一是我的BERT文章的一个版本。您将看到BERT QA能够回答任何可以从文章中获得答案的问题。您可以为自己的问答应用程序定制代码。

Hopefully this provides you with a good jump start to use BERT for your own practical applications. If you have any questions or feedback, feel free to let me know!

希望这可以为您在自己的实际应用中使用BERT提供一个良好的开始。如果您有任何疑问或反馈，请随时告诉我！