1-pipeline()函数-transformers-python库

最新推荐文章于 2024-02-24 08:13:42 发布

Flora-pi

最新推荐文章于 2024-02-24 08:13:42 发布

阅读量244

点赞数 6

分类专栏：人工智能文章标签：人工智能

本文链接：https://blog.csdn.net/qq_38022326/article/details/136086424

版权

人工智能专栏收录该内容

3 篇文章 0 订阅

订阅专栏

`pipeline()`函数

pipeline()函数是Transformers库中最基本的工具。Transformer模型用于解决各种NLP任务，Transformers库提供了创建和使用这些模型的功能。我们先来看一看pipeline()是如何解决NLP问题。

文章目录

`pipeline()`函数

情感分析

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('I`ve been waiting for a HuggingFace course my whole life.')
# [{'label': 'POSITIVE', 'score': 0.9273151755332947}]

我们可以上传多句进行情感分析：

classifier(['I`ve been waiting for a HuggingFace course my whole life.', 'I hate this so much!'])
# [{'label': 'POSITIVE', 'score': 0.9273151755332947},
#  {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

将文本上传到pipeline()时涉及三个主要步骤：

文本预处理：处理成模型可以理解的格式；
传递给模型：将预处理后的文本传递给模型；
输出结果：模型处理后输出结果。

目前可用的一些pipelines有：

feature-extraction（特征提取）
fill-mask(填充空缺)
named entity recognition(ner, 命名实体识别)
question-answering（问答）
sentiment-analysis（情感分析）
summarization（文本摘要）
text-generation（文本生成）
translation（翻译）
zero-shot-classification（零样本分类）

zero-shot-classification（零样本分类）

零样本分类（zero-shot-classification），是一种特殊的机器学习范式，它允许机器在不依赖任何特定未见过类别的训练样本的情况下，对这些未知类别的实例进行正确的分类。这种技术的核心思想是利用现有类别的数据来学习，并通过这种方式对新对象进行分类。零样本分类可以广泛应用于计算机视觉、自然语言处理等多种监督学习任务中。

可以使用pipeline()直接指定用于分类的标签进行零样本分类：

from transformers import pipeline
classifier = pipeline('zero-shot-classification')
classifier(
    'This is a course about the Transformers library',
    candidate_labels = ['education', 'politics', 'business'],
)
# {'sequence': 'This is a course about the Transformers library',
# 'labels': ['education', 'business', 'politics'],
# 'scores': [0.8445989489555359, 0.11197412759065628, 0.04342695698142052]}

text-generation（文本生成）

使用pipeline生成一些文本。首先需要给一个提示，模型根据提示自动完成需要生成的内容。文本生成有一定的随机性，生成的内容有偏差是正常的。

from transformers import pipeline
generator = pipeline('text-generation')
generator('In the course, we will teach you how to')
# [{'generated_text': "In the course, we will teach you how to use your own ideas, 
# thoughts, and ideas in practice through building a toolset.\n\nDon't put yourself 
# in anyone's shoes, however, if you put a whole bunch of assumptions and beliefs"}]

可以使用参数num_return_sequences控制生成多个不同的序列，使用参数max_length控制输出文本的总长度：

from transformers import pipeline
generator = pipeline('text-generation', num_return_sequences = 3, max_length = 10)
generator('I')
# [{'generated_text': 'I have already talked to the other parties who want'},
#  {'generated_text': 'I don\'t know what to say."\n\n'},
#  {'generated_text': 'I a very happy woman who still loves it,"'}]

fill-mask(填充空缺)

fill-mask 是填充给定文本中的空白。

from transformers import pipeline
unmasker = pipeline('fill-mask')
unmasker('This course will teach you all about <mask> models', top_k = 2)
# [{'score': 0.19631513953208923,
#   'token': 30412,
#   'token_str': ' mathematical',
#   'sequence': 'This course will teach you all about mathematical models'},
#  {'score': 0.04449228197336197,
#   'token': 745,
#   'token_str': ' building',
#   'sequence': 'This course will teach you all about building models'}]