一起学Hugging Face Transformers（16）- transform库的pipeline

做个天秤座的程序猿

于 2024-07-14 00:45:00 发布

阅读量731

点赞数 23

分类专栏： Hugging Face Transformers 文章标签： transformers pipeline

本文链接：https://blog.csdn.net/kljyrx/article/details/140374879

版权

Hugging Face Transformers 专栏收录该内容

18 篇文章 1 订阅

订阅专栏

前言

transformers 库中的 pipeline 是一个非常便捷的高级 API，它封装了许多常见的自然语言处理任务，使得用户可以轻松调用预训练模型进行各种任务，而不需要深入了解模型的底层实现。下面是关于 pipeline 的详细介绍。

一、什么是 `pipeline`？

pipeline 是 Hugging Face Transformers 库中的一个高级 API，它提供了一种简便的方式来使用预训练模型进行各种 NLP 任务，比如情感分析、文本生成、翻译、问答等。通过 pipeline，你可以在几行代码内实现复杂的 NLP 任务。

1. 安装必要的库

首先确保你已经安装了 transformers 和相应的深度学习框架（PyTorch 或 TensorFlow）：

pip install transformers
pip install torch  # 如果你选择使用 PyTorch
# 或者
pip install tensorflow  # 如果你选择使用 TensorFlow

2. 使用 `pipeline` 进行情感分析

我们将通过一个情感分析的例子来展示如何使用 pipeline。

from transformers import pipeline

# 加载情感分析的pipeline
sentiment_analysis = pipeline("sentiment-analysis")

# 进行情感分析
results = sentiment_analysis(["I love this!", "I hate this!"])

# 输出结果
for result in results:
    print(f"Label: {result['label']}, Score: {result['score']:.4f}")

3. `pipeline` 的参数

pipeline 可以接收多个参数来配置它的行为：

task：任务类型，例如 "sentiment-analysis"、"text-generation"、"translation_en_to_fr" 等。
model：模型的名称或路径。如果不指定，pipeline 会自动选择一个合适的预训练模型。
tokenizer：分词器的名称或路径。默认情况下，会选择与模型匹配的分词器。
framework：选择使用的深度学习框架，"pt" 表示 PyTorch，"tf" 表示 TensorFlow。如果安装了两个框架，pipeline 会自动选择一个。
device：设置使用的设备，-1 表示使用 CPU，0 表示使用第一个 GPU。

4. 更多的 `pipeline` 任务

除了情感分析，pipeline 还支持许多其他任务。以下是一些常见的任务示例：

1) 文本生成

text_generator = pipeline("text-generation")
results = text_generator("Once upon a time,")
for result in results:
    print(result["generated_text"])

2) 翻译

translator = pipeline("translation_en_to_fr")
results = translator("Hello, how are you?")
for result in results:
    print(result["translation_text"])

3) 问答

question_answerer = pipeline("question-answering")
context = "Hugging Face is creating a tool that democratizes AI."
results = question_answerer(question="What is Hugging Face creating?", context=context)
print(f"Answer: {results['answer']}")

4) 文本分类

classifier = pipeline("zero-shot-classification")
sequence = "I love using Hugging Face's Transformers library!"
candidate_labels = ["technology", "education", "politics"]
results = classifier(sequence, candidate_labels)
for result in results["labels"]:
    print(result)

二、 pipeline 和 AutoModel.from_pretrained 的区别

pipeline 和 AutoModel.from_pretrained 都是 Hugging Face Transformers 库中的重要功能，但它们的用途和抽象层次有所不同。以下是它们的主要区别和各自的优缺点。

1. `pipeline` 的特点

pipeline 是一个高级 API，旨在简化常见自然语言处理任务的执行。它集成了模型、分词器、任务处理等，用户只需几行代码即可完成复杂的 NLP 任务。

优点：

易用性：极大地简化了使用预训练模型的过程，适合快速原型设计和新手入门。
高集成度：将模型、分词器和任务逻辑封装在一起，无需手动处理数据预处理和后处理。
任务多样性：支持多种 NLP 任务，如情感分析、文本生成、翻译、问答等。

缺点：

灵活性有限：由于高度封装，定制化需求（如特殊的预处理或后处理）可能较难实现。
控制力度小：对模型和任务处理的底层细节控制较少。

2. `AutoModel.from_pretrained` 的特点

AutoModel.from_pretrained 是一个底层 API，主要用于加载预训练模型。这种方法提供了更大的灵活性和控制力，适合需要高度定制化的应用。

优点：

灵活性强：可以灵活组合模型、分词器和任务处理逻辑，满足复杂的定制化需求。
控制力大：可以深入控制模型的各个细节，如输入输出格式、训练过程等。
适用性广：适用于需要自定义训练、微调模型或进行研究性工作的场景。

缺点：

复杂性高：需要手动处理数据预处理、模型输入输出等，代码量较大。
门槛较高：需要对 Transformers 库和深度学习有较深入的了解。

3. 示例对比

使用 pipeline 示例

以下是使用 pipeline 进行情感分析的示例：

from transformers import pipeline

# 创建情感分析pipeline
sentiment_analysis = pipeline("sentiment-analysis")

# 对文本进行情感分析
results = sentiment_analysis(["I love this!", "I hate this!"])
for result in results:
    print(f"Label: {result['label']}, Score: {result['score']:.4f}")

使用 AutoModel.from_pretrained 示例

以下是使用 AutoModel.from_pretrained 进行情感分析的示例：

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# 加载模型和分词器
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 准备输入数据
texts = ["I love this!", "I hate this!"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

# 进行推理
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# 解析结果
labels = ['NEGATIVE', 'POSITIVE']
for text, prediction in zip(texts, predictions):
    label = labels[prediction.argmax()]
    score = prediction.max().item()
    print(f"Text: {text}, Label: {label}, Score: {score:.4f}")

4. 选择合适的方法

使用 pipeline：如果你需要快速实现一个 NLP 任务，且不需要对底层实现进行太多定制，pipeline 是一个很好的选择。
使用 AutoModel.from_pretrained：如果你需要深入控制模型的各个方面，或需要对模型进行微调和自定义，使用 AutoModel.from_pretrained 是更合适的选择。

通过理解这两种方法的区别和用途，你可以更好地选择适合自己项目需求的实现方式。

三、torch.no_grad()的用途

思考下下面这种写法和直接 print(model(**inputs)) 的区别是什么呢？

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

1. `with torch.no_grad()`

作用：torch.no_grad() 是 PyTorch 提供的一个上下文管理器，用于在推理（即预测）过程中关闭自动求导机制。

原因：

节省内存：在模型推理过程中，不需要计算梯度，因此使用 torch.no_grad() 可以节省大量内存，这对于处理大批量数据或在有限内存的设备（如 GPU）上推理非常有用。
提高速度：关闭自动求导可以减少计算量，从而加快推理速度。

示例：

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# 加载模型和分词器
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 准备输入数据
texts = ["I love this!", "I hate this!"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

# 进行推理，使用 torch.no_grad() 关闭自动求导
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# 解析结果
labels = ['NEGATIVE', 'POSITIVE']
for text, prediction in zip(texts, predictions):
    label = labels[prediction.argmax()]
    score = prediction.max().item()
    print(f"Text: {text}, Label: {label}, Score: {score:.4f}")

2. 直接 `print(model(**inputs))`

作用：直接调用 model(**inputs) 并打印结果，这样会在计算过程中启用自动求导机制。

原因：

自动求导：在模型的前向传播过程中，PyTorch 会记录操作以支持后向传播和梯度计算。这在训练过程中是必要的，但在推理过程中是不需要的。
资源消耗：因为记录操作以支持梯度计算会消耗更多内存和计算资源，所以在推理时不推荐这种方法。

示例：

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# 加载模型和分词器
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 准备输入数据
texts = ["I love this!", "I hate this!"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

# 进行推理，直接调用模型（包含自动求导）
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# 解析结果
labels = ['NEGATIVE', 'POSITIVE']
for text, prediction in zip(texts, predictions):
    label = labels[prediction.argmax()]
    score = prediction.max().item()
    print(f"Text: {text}, Label: {label}, Score: {score:.4f}")

3. 小结

with torch.no_grad()：
- 关闭自动求导机制。
- 节省内存和计算资源。
- 提高推理速度。
- 适用于推理和评估阶段。
直接调用 model(**inputs)：
- 启用自动求导机制。
- 占用更多内存和计算资源。
- 适用于训练阶段，因为需要计算梯度。

在推理阶段，建议使用 with torch.no_grad() 来优化性能和资源使用。

四、常用优化

为了更高效地使用模型，尤其是在实际生产环境中，了解和应用以下最佳实践是非常重要的：

在推理阶段使用 torch.no_grad()：

确保在模型预测时不计算梯度，节省内存和计算资源。

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

使用适当的批处理（batching）：

将输入数据分成小批次进行处理，可以充分利用 GPU 的并行计算能力，提高计算效率。

batch_size = 8  # 你可以根据你的GPU内存大小调整
for i in range(0, len(inputs), batch_size):
    batch_inputs = {k: v[i:i+batch_size] for k, v in inputs.items()}
    with torch.no_grad():
        outputs = model(**batch_inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

优化模型部署：

使用模型优化工具如 ONNX、TensorRT 等，将模型转换为更加高效的格式，提高推理速度和效率。
例如，可以使用 Hugging Face 的 optimum 库来优化 Transformers 模型：

pip install optimum

from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

inputs = tokenizer(["I love this!", "I hate this!"], return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)

使用混合精度（Mixed Precision）：

使用半精度浮点数（FP16）可以显著提高计算速度和减少内存占用，尤其是在 GPU 上运行时。

from torch.cuda.amp import autocast

with torch.no_grad():
    with autocast():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

模型压缩：
- 使用模型蒸馏、剪枝和量化等技术来减小模型大小，提高推理效率。
- Hugging Face 提供了 transformers 和 optimum 库的一些工具，可以用来进行模型蒸馏和剪枝。

通过这些优化技术和最佳实践，可以大幅提升模型推理的效率，减少资源浪费，适应大规模生产环境的需求。

做个天秤座的程序猿

关注

23
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
一起学Hugging Face Transformers（16）- transform库的pipeline

库中的pipeline是一个非常便捷的高级 API，它封装了许多常见的自然语言处理任务，使得用户可以轻松调用预训练模型进行各种任务，而不需要深入了解模型的底层实现。下面是关于pipeline的详细介绍。pipeline。
复制链接

扫一扫