问答任务-多选问答通过微调模型构建多选任务

这个巧妙

已于 2024-05-20 14:45:03 修改

阅读量586

点赞数 10

文章标签：数据挖掘人工智能 transformer 深度学习分类

于 2024-05-11 09:09:37 首次发布

本文链接：https://blog.csdn.net/lw19155275856/article/details/138698006

版权

本文涉及的jupter notebook在篇章4代码库中。

如果您在colab上打开这个jupyter笔记本，您需要安装🤗Trasnformers和🤗datasets。具体命令如下（取消注释并运行，如果速度慢请切换国内源，加上第二行的参数）。

在运行单元格之前，建议您按照本项目readme中提示，建立一个专门的python环境用于学习。

#! pip install datasets transformers 
# -i https://pypi.tuna.tsinghua.edu.cn/simple

如果您是在本地机器上打开这个jupyter笔记本，请确保您的环境安装了上述库的最新版本。

您可以在这里找到这个jupyter笔记本的具体的python脚本文件，还可以通过分布式的方式使用多个gpu或tpu来微调您的模型。

通过微调模型构建多选任务

在当前jupyter笔记本中，我们将说明如何通过微调任意🤗Transformers 模型来构建多选任务，该任务是在给定的多个答案中选择最合理的一个。我们使用的数据集是SWAG，当然你也可以将预处理过程用于其他多选数据集或者你自己的数据。SWAG是一个关于常识推理的数据集，每个样本描述一种情况，然后给出四个可能的选项。

这个jupyter笔记本可以运行在model Hub中的任何模型上，只要该模型具有一个多选择头的版本。根据你的模型和你使用的GPU，你可能需要调整批大小，以避免显存不足的错误。设置好这两个参数之后，jupyter笔记本的其余部分就可以顺利运行了:

model_checkpoint = "bert-base-uncased"
batch_size = 16

加载数据集

我们将使用🤗Datasets库来下载数据。这一过程可以很容易地用函数load_dataset来完成。

from datasets import load_dataset, load_metric

load_dataset 将缓存数据集以避免下次运行时再次下载它。

datasets = load_dataset("swag", "regular")

Reusing dataset swag (/home/sgugger/.cache/huggingface/datasets/swag/regular/0.0.0/f9784740e0964a3c799d68cec0d992cc267d3fe94f3e048175eca69d739b980d)

除此之外，你也可以从我们提供的链接下载数据并解压，将解压后的3个csv文件复制到到docs/篇章4-使用Transformers解决NLP任务/datasets/swag目录下，然后用下面的代码进行加载。

import os

data_path = './datasets/swag/'
cache_dir = os.path.join(data_path, 'cache')
data_files = {'train': os.path.join(data_path, 'train.csv'), 'val': os.path.join(data_path, 'val.csv'), 'test': os.path.join(data_path, 'test.csv')}
datasets = load_dataset(data_path, 'regular', data_files=data_files, cache_dir=cache_dir)

Using custom data configuration regular-2ab2d66f12115abf


Downloading and preparing dataset swag/regular (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to ./datasets/swag/cache/swag/regular-2ab2d66f12115abf/0.0.0/a16ae67faa24f4cdd6d1fc6bfc09bdb6dc15771716221ff8bacbc6cc75533614...

Dataset swag downloaded and prepared to ./datasets/swag/cache/swag/regular-2ab2d66f12115abf/0.0.0/a16ae67faa24f4cdd6d1fc6bfc09bdb6dc15771716221ff8bacbc6cc75533614. Subsequent calls will reuse this data.

dataset对象本身是DatasetDict，它包含用于训练、验证和测试集的键值对(mnli是一个特殊的例子，其中包含用于不匹配的验证和测试集的键值对)。

datasets

DatasetDict({
    train: Dataset({
        features: ['video-id', 'fold-ind', 'startphrase', 'sent1', 'sent2', 'gold-source', 'ending0', 'ending1', 'ending2', 'ending3', 'label'],
        num_rows: 73546
    })
    validation: Dataset({
        features: ['video-id', 'fold-ind', 'startphrase', 'sent1', 'sent2', 'gold-source', 'ending0', 'ending1', 'ending2', 'ending3', 'label'],
        num_rows: 20006
    })
    test: Dataset({
        features: ['video-id', 'fold-ind', 'startphrase', 'sent1', 'sent2', 'gold-source', 'ending0', 'ending1', 'ending2', 'ending3', 'label'],
        num_rows: 20005
    })
})

To access an actual element, you need to select a split first, then give an index:

datasets["train"][0]

{'ending0': 'passes by walking down the street playing their instruments.',
 'ending1': 'has heard approaching them.',
 'ending2': "arrives and they're outside dancing and asleep.",
 'ending3': 'turns the lead singer watches the performance.',
 'fold-ind': '3416',
 'gold-source': 'gold',
 'label': 0,
 'sent1': 'Members of the procession walk down the street holding small horn brass instruments.',
 'sent2': 'A drum line',
 'startphrase': 'Members of the procession walk down the street holding small horn brass instruments. A drum line',
 'video-id': 'anetv_jkn6uvmqwh4'}

为了了解数据是什么样子的，下面的函数将显示数据集中随机选取的一些示例。

from datasets import ClassLabel
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

show_random_elements(datasets["train"])

	ending0	ending1	ending2	ending3	fold-ind	gold-source	label	sent1	sent2	startphrase	video-id
0	are seated on a field.	are skiing down the slope.	are in a lift.	are pouring out in a man.	16668	gold	1	A man is wiping the skiboard.	Group of people	A man is wiping the skiboard. Group of people	anetv_JmL6BiuXr_g
1	performs stunts inside a gym.	shows several shopping in the water.	continues his skateboard while talking.	is putting a black bike close.	11424	gold	0	The credits of the video are shown.	A lady	The credits of the video are shown. A lady	anetv_dWyE0o2NetQ
2	is emerging into the hospital.	are strewn under water at some wreckage.	tosses the wand together and saunters into the marketplace.	swats him upside down.	15023	gen	1	Through his binoculars, someone watches a handful of surfers being rolled up into the wave.	Someone	Through his binoculars, someone watches a handful of surfers being rolled up into the wave. Someone	lsmdc3016_CHASING_MAVERICKS-6791
3	spies someone sitting below.	opens the fridge and checks out the photo.	puts a little sheepishly.	staggers up to him.	5475	gold	3	He tips it upside down, and its little umbrella falls to the floor.	Back inside, someone	He tips it upside down, and its little umbrella falls to the floor. Back inside, someone	lsmdc1008_Spider-Man2-75503
4	carries her to the grave.	laughs as someone styles her hair.	sets down his glass.	stares after her then trudges back up into the street.	6904	gen	1	Someone kisses her smiling daughter on the cheek and beams back at the camera.	Someone	Someone kisses her smiling daughter on the cheek and beams back at the camera. Someone	lsmdc1028_No_Reservations-83242
5	stops someone and sweeps all the way back from the lower deck to join them.	is being dragged towards the monstrous animation.	beats out many events at the touch of the sword, crawling it.	reaches into a pocket and yanks open the door.	14089	gen	1	But before he can use his wand, he accidentally rams it up the troll's nostril.	The angry troll	But before he can use his wand, he accidentally rams it up the troll's nostril. The angry troll	lsmdc1053_Harry_Potter_and_the_philosophers_stone-95867
6	sees someone's name in the photo.	gives a surprised look.	kneels down and touches his ripped specs.	spies on someone's clock.	8407	gen	1	Someone keeps his tired eyes on the road.	Glancing over, he	Someone keeps his tired eyes on the road. Glancing over, he	lsmdc1024_Identity_Thief-82693
7	stops as someone speaks into the camera.	notices how blue his eyes are.	is flung out of the door and knocks the boy over.	flies through the air, its a fireball.	4523	gold	1	Both people are knocked back a few steps from the force of the collision.	She	Both people are knocked back a few steps from the force of the collision. She	lsmdc0043_Thelma_and_Luise-68271
8	sits close to the river.	have pet's supplies and pets.	pops parked outside the dirt facility, sending up a car highway to catch control.	displays all kinds of power tools and website.	8112	gold	1	A guy waits in the waiting room with his pet.	A pet store and its van	A guy waits in the waiting room with his pet. A pet store and its van	anetv_9VWoQpg9wqE
9	the slender someone, someone turns on the light.	, someone gives them to her boss then dumps some alcohol into dough.	liquids from a bowl, she slams them drunk.	wags his tail as someone returns to the hotel room.	10867	gold	3	Inside a convenience store, she opens a freezer case.	Dolce	Inside a convenience store, she opens a freezer case. Dolce	lsmdc3090_YOUNG_ADULT-43871

数据集中的每个示例都有一个上下文，它是由第一个句子(字段sent1)和第二个句子的简介(字段sent2)组成。然后给出四种可能的结尾(字段ending0， ending1， ending2和ending3)，然后让模型从中选择正确的一个(由字段label表示)。下面的函数让我们更直观地看到一个示例:

def show_one(example):
    print(f"Context: {example['sent1']}")
    print(f"  A - {example['sent2']} {example['ending0']}")
    print(f"  B - {example['sent2']} {example['ending1']}")
    print(f"  C - {example['sent2']} {example['ending2']}")
    print(f"  D - {example['sent2']} {example['ending3']}")
    print(f"\nGround truth: option {['A', 'B', 'C', 'D'][example['label']]}")

show_one(datasets["train"][0])

Context: Members of the procession walk down the street holding small horn brass instruments.
  A - A drum line passes by walking down the street playing their instruments.
  B - A drum line has heard approaching them.
  C - A drum line arrives and they're outside dancing and asleep.
  D - A drum line turns the lead singer watches the performance.

Ground truth: option A

show_one(datasets["train"][15])

Context: Now it's someone's turn to rain blades on his opponent.
  A - Someone pats his shoulder and spins wildly.
  B - Someone lunges forward through the window.
  C - Someone falls to the ground.
  D - Someone rolls up his fast run from the water and tosses in the sky.

Ground truth: option C

数据预处理

在将这些文本输入到模型之前，我们需要对它们进行预处理。这是由🤗transformer的Tokenizer完成的，正如它的名字所暗示的那样，它将输入表示为一系列token，然后通过查找预训练好的词汇表，将它们转换为相应的id。最后转换成模型所期望的格式，同时生成模型所需的其他输入。

为了做到这一切，我们使用AutoTokenizer的from_pretrained方法实例化我们的tokenizer，它将确保:

-我们得到一个对应于我们想要使用的模型架构的tokenizer，
-我们下载好了预训练这个特定模型时使用的词表。

同时，该词表将被缓存，因此下次运行时不会再次下载它。

from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

我们将use_fast=True作为参数入，以使用🤗tokenizers库中的一个快速tokenizer(它由Rust支持的)。这些快速tokenizer几乎适用于所有模型，但如果您在前面的调用中出现错误，请删除该参数。

你可以直接在一个句子或一个句子对上调用这个tokenizer:

tokenizer("Hello, this one sentence!", "And this sentence goes with it.")

{'input_ids': [101, 7592, 1010, 2023, 2028, 6251, 999, 102, 1998, 2023, 6251, 3632, 2007, 2009, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

根据您选择的模型，您将在上面单元格返回的字典中看到不同的键值对。它们对于我们在这里所做的并不重要，只需要知道它们是我们稍后实例化的模型所需要的。如果您对此感兴趣，可以在本教程中了解更多关于它们的信息。

如下面的字典所示，为了对数据集进行预处理，我们需要知道包含句子的列的名称:

我们可以写一个函数来预处理我们的样本。在调用tokenizer之前，最棘手的部分是将所有可能的句子对放在两个大列表中，然后将结果拉平，以便每个示例有四个输入id、注意力掩码等。

当调用tokenizer时，我们传入参数truncation=True。这将确保比所选模型所能处理的更长的输入将被截断为模型所能接受的最大长度。

ending_names = ["ending0", "ending1", "ending2", "ending3"]

def preprocess_function(examples):
    # Repeat each first sentence four times to go with the four possibilities of second sentences.
    first_sentences = [[context] * 4 for context in examples["sent1"]]
    # Grab all second sentences possible for each context.
    question_headers = examples["sent2"]
    second_sentences = [[f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers)]
    
    # Flatten everything
    first_sentences = sum(first_sentences, [])
    second_sentences = sum(second_sentences, [])
    
    # Tokenize
    tokenized_examples = tokenizer(first_sentences, second_sentences, truncation=True)
    # Un-flatten
    return {k: [v[i:i+4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists of lists for each key: a list of all examples (here 5), then a list of all choices (4) and a list of input IDs (length varying here since we did not apply any padding):

这个函数可以使用一个或多个示例。在传入多个示例时，tokenizer将为每个键返回一个列表的列表：所有示例的列表(长度为5)，然后是所有选项的列表(长度为4)以及输入id的列表(长度不同，因为我们没有应用任何填充):

examples = datasets["train"][:5]
features = preprocess_function(examples)
print(len(features["input_ids"]), len(features["input_ids"][0]), [len(x) for x in features["input_ids"][0]])

5 4 [30, 25, 30, 28]

让我们解码一下给定示例的输入:

idx = 3
[tokenizer.decode(features["input_ids"][idx][i]) for i in range(4)]

['[CLS] a drum line passes by walking down the street playing their instruments. [SEP] members of the procession are playing ping pong and celebrating one left each in quick. [SEP]',
 '[CLS] a drum line passes by walking down the street playing their instruments. [SEP] members of the procession wait slowly towards the cadets. [SEP]',
 '[CLS] a drum line passes by walking down the street playing their instruments. [SEP] members of the procession makes a square call and ends by jumping down into snowy streets where fans begin to take their positions. [SEP]',
 '[CLS] a drum line passes by walking down the street playing their instruments. [SEP] members of the procession play and go back and forth hitting the drums while the audience claps for them. [SEP]']

我们可以将它和之前生成的ground truth进行比较：

show_one(datasets["train"][3])

Context: A drum line passes by walking down the street playing their instruments.
  A - Members of the procession are playing ping pong and celebrating one left each in quick.
  B - Members of the procession wait slowly towards the cadets.
  C - Members of the procession makes a square call and ends by jumping down into snowy streets where fans begin to take their positions.
  D - Members of the procession play and go back and forth hitting the drums while the audience claps for them.

Ground truth: option D

这似乎没问题。我们可以将这个函数应用到我们数据集的所有示例中，只需要使用我们之前创建的dataset对象的map方法。这将应用于dataset对象的所有切分的所有元素，所以我们的训练，验证和测试数据将以相同的方式进行预处理。

encoded_datasets = datasets.map(preprocess_function, batched=True)

Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/swag/regular/0.0.0/f9784740e0964a3c799d68cec0d992cc267d3fe94f3e048175eca69d739b980d/cache-975c81cf12e5b7ac.arrow
Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/swag/regular/0.0.0/f9784740e0964a3c799d68cec0d992cc267d3fe94f3e048175eca69d739b980d/cache-d4806d63f1eaf5cd.arrow
Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/swag/regular/0.0.0/f9784740e0964a3c799d68cec0d992cc267d3fe94f3e048175eca69d739b980d/cache-258c9cd71b0182db.arrow

更好的是，结果会被🤗Datasets库自动缓存，以避免下次运行时在这一步上花费时间。🤗Datasets库通常足够智能，它可以检测传递给map的函数何时发生更改(此时不再使用缓存数据)。例如，它将检测您是否在第一个单元格中更改了任务并重新运行笔记本。当🤗Datasets使用缓存文件时，它提示相应的警告，你可以在调用map中传入load_from_cache_file=False从而不使用缓存文件，并强制进行预处理。

请注意，我们传递了batched=True以批量对文本进行编码。这是为了充分利用我们前面加载的快速tokenizer的优势，它将使用多线程并发地处理批中的文本。

微调模型

现在我们的数据已经准备好了，我们可以下载预训练好的模型并对其进行微调。因为我们的任务是关于多项选择的，所以我们使用AutoModelForMultipleChoice类。与tokenizer一样，from_pretrained方法将为我们下载并缓存模型。

from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer

model = AutoModelForMultipleChoice.from_pretrained(model_checkpoint)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultipleChoice: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForMultipleChoice from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultipleChoice from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultipleChoice were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

这个警告告诉我们，我们正在丢弃一些权重(vocab_transform和vocab_layer_norm层)，并随机初始化其他一些参数(pre_classifier和classifier层)。这是完全正常的情况，因为我们舍弃了在预训练模型时用于掩码语言建模的头，代之以一个新的多选头，并且我们没有其预训练好的权重，所以这个警告告诉我们使用这个模型来推理之前需要微调，而这正是我们要做的。

为了实例化一个Trainer，我们需要定义另外三个东西。最重要的是TrainingArguments，它是一个包含所有用于训练的属性的类。它需要传入一个文件夹名，用于保存模型的检查点，而所有其他参数都是可选的:

args = TrainingArguments(
    "test-glue",
    evaluation_strategy = "epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    weight_decay=0.01,
)

在这里，我们设置在每个epoch的末尾进行评估，调整学习速率，使用在jupyter笔记本顶部定义的batch_size，并定制用于训练的epoch的数量，以及权重衰减。

然后，我们需要告诉我们的Trainer如何从预处理的输入数据中构造批数据。我们还没有做任何填充，因为我们将填充每个批到批内的最大长度(而不是使用整个数据集的最大长度)。这将是data collator的工作。它接受示例的列表，并将它们转换为一个批(在我们的示例中，通过应用填充)。由于在库中没有data collator来处理我们的特定问题，这里我们根据DataCollatorWithPadding自行改编一个:

from dataclasses import dataclass
from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
from typing import Optional, Union
import torch

@dataclass
class DataCollatorForMultipleChoice:
    """
    Data collator that will dynamically pad the inputs for multiple choice received.
    """

    tokenizer: PreTrainedTokenizerBase
    padding: Union[bool, str, PaddingStrategy] = True
    max_length: Optional[int] = None
    pad_to_multiple_of: Optional[int] = None

    def __call__(self, features):
        label_name = "label" if "label" in features[0].keys() else "labels"
        labels = [feature.pop(label_name) for feature in features]
        batch_size = len(features)
        num_choices = len(features[0]["input_ids"])
        flattened_features = [[{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features]
        flattened_features = sum(flattened_features, [])
        
        batch = self.tokenizer.pad(
            flattened_features,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors="pt",
        )
        
        # Un-flatten
        batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
        # Add back labels
        batch["labels"] = torch.tensor(labels, dtype=torch.int64)
        return batch

当传入一个示例的列表时，它会将大列表中的所有输入/注意力掩码等都压平，并传递给tokenizer.pad方法。这将返回一个带有大张量的字典(其大小为(batch_size * 4) x seq_length)，然后我们将其展开。

我们可以在特征列表上检查data collator是否正常工作，在这里，我们只需要确保删除所有不被我们的模型接受的输入特征(这是Trainer自动为我们做的)：

accepted_keys = ["input_ids", "attention_mask", "label"]
features = [{k: v for k, v in encoded_datasets["train"][i].items() if k in accepted_keys} for i in range(10)]
batch = DataCollatorForMultipleChoice(tokenizer)(features)

再次强调，所有这些压平的、未压平的都可能是潜在错误的来源，所以让我们对输入进行另一个完整性检查：

[tokenizer.decode(batch["input_ids"][8][i].tolist()) for i in range(4)]

['[CLS] someone walks over to the radio. [SEP] someone hands her another phone. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]',
 '[CLS] someone walks over to the radio. [SEP] someone takes the drink, then holds it. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]',
 '[CLS] someone walks over to the radio. [SEP] someone looks off then looks at someone. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]',
 '[CLS] someone walks over to the radio. [SEP] someone stares blearily down at the floor. [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]']

show_one(datasets["train"][8])

Context: Someone walks over to the radio.
  A - Someone hands her another phone.
  B - Someone takes the drink, then holds it.
  C - Someone looks off then looks at someone.
  D - Someone stares blearily down at the floor.

Ground truth: option D

所有的都正常运行!

最后要为Trainer定义如何根据预测计算评估指标。我们需要来定义一个函数，它将使用我们之前加载的metric，我们必须做的唯一预处理是取我们预测的logits的argmax：

import numpy as np

def compute_metrics(eval_predictions):
    predictions, label_ids = eval_predictions
    preds = np.argmax(predictions, axis=1)
    return {"accuracy": (preds == label_ids).astype(np.float32).mean().item()}

然后，我们只需要将所有这些以及我们的数据集一起传入Trainer：

trainer = Trainer(
    model,
    args,
    train_dataset=encoded_datasets["train"],
    eval_dataset=encoded_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer),
    compute_metrics=compute_metrics,
)

现在，我们可以通过调用train方法来微调模型：

trainer.train()

<div>
    <style>
        /* Turns off some styling */
        progress {
            /* gets rid of default border in Firefox and Opera. */
            border: none;
            /* Needs to be in here for Safari polyfill so background images work as expected. */
            background-size: auto;
        }
    </style>

  <progress value='6897' max='6897' style='width:300px; height:20px; vertical-align: middle;'></progress>
  [6897/6897 23:49, Epoch 3/3]
</div>
<table border="1" class="dataframe">

Epoch Training Loss Validation Loss Accuracy 1 0.154598 0.828017 0.766520 2 0.296633 0.667454 0.786814 3 0.111786 0.994927 0.789363

TrainOutput(global_step=6897, training_loss=0.19714653808275168)

最后，不要忘记将你的模型上传到🤗 模型中心。