一起学Hugging Face Transformers（15）- 使用Transformers 进行情感分析

做个天秤座的程序猿

于 2024-07-13 00:45:00 发布

阅读量440

点赞数 15

分类专栏： Hugging Face Transformers 文章标签： transformers 情感分析

本文链接：https://blog.csdn.net/kljyrx/article/details/140374305

版权

Hugging Face Transformers 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

文章目录

前言
一、环境准备
二、加载预训练模型
三、示例：情感分析
四、处理数据集
五、自定义模型
总结
思考

前言

情感分析（Sentiment Analysis）是自然语言处理（NLP）中的一个重要任务，旨在确定文本的情感倾向，如积极、消极或中性。Hugging Face 的 Transformers 库提供了强大的工具，可以轻松实现情感分析。在这篇文章中，我们将介绍如何使用 Transformers 库进行情感分析。

一、环境准备

在开始之前，我们需要安装必要的库：

pip install transformers
pip install torch  # 或者 tensorflow，根据你选择的深度学习框架
pip install pandas  # 用于处理数据

二、加载预训练模型

我们可以使用 Hugging Face 提供的预训练情感分析模型。这里我们将使用 distilbert-base-uncased-finetuned-sst-2-english，这是一个基于 DistilBERT 的模型，已经在情感分析数据集 SST-2 上进行了微调。

from transformers import pipeline

# 加载预训练的情感分析模型
sentiment_analysis = pipeline("sentiment-analysis")

三、示例：情感分析

加载模型后，我们可以对一些示例文本进行情感分析。

texts = [
    "I love this product! It's amazing.",
    "I am very disappointed with the service.",
    "The movie was just okay, nothing special.",
]

results = sentiment_analysis(texts)

for text, result in zip(texts, results):
    print(f"Text: {text}\nSentiment: {result['label']}, Score: {result['score']:.4f}\n")

输出结果如下：

Text: I love this product! It's amazing.
Sentiment: POSITIVE, Score: 0.9998

Text: I am very disappointed with the service.
Sentiment: NEGATIVE, Score: 0.9991

Text: The movie was just okay, nothing special.
Sentiment: NEUTRAL, Score: 0.7890

四、处理数据集

在实际应用中，我们通常需要对大量文本数据进行情感分析。以下是如何使用 Pandas 处理数据集的示例。

假设我们有一个包含评论的数据集（CSV 文件），每行包含一个评论文本：

import pandas as pd

# 加载数据集
df = pd.read_csv("reviews.csv")

# 显示前几行
print(df.head())

# 对所有评论进行情感分析
df["sentiment"] = df["review"].apply(lambda x: sentiment_analysis(x)[0]['label'])

# 保存结果
df.to_csv("reviews_with_sentiment.csv", index=False)

五、自定义模型

如果预训练模型不能完全满足你的需求，你可以在自己的数据集上微调模型。以下是一个简单的微调示例：

from transformers import Trainer, TrainingArguments, AutoModelForSequenceClassification, AutoTokenizer

# 加载预训练模型和分词器
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# 准备数据集
train_texts = ["I love this!", "I hate this!"]
train_labels = [1, 0]
encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=128)
train_dataset = Dataset(encodings, train_labels)

# 定义训练参数
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
)

# 定义 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

# 训练模型
trainer.train()

总结

Hugging Face Transformers 库使得情感分析变得非常简单，无论是使用预训练模型还是在自己的数据集上微调模型。通过本文，你应该已经掌握了如何加载预训练模型、处理文本数据并进行情感分析。希望这对你的 NLP 项目有所帮助！

思考

学习Transformers的pipeline
pipeline 和 AutoModel.from_pretrained这种加载模型的方式有什么不同

做个天秤座的程序猿

关注

15
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
一起学Hugging Face Transformers（15）- 使用Transformers 进行情感分析

情感分析（Sentiment Analysis）是自然语言处理（NLP）中的一个重要任务，旨在确定文本的情感倾向，如积极、消极或中性。Hugging Face 的 Transformers 库提供了强大的工具，可以轻松实现情感分析。在这篇文章中，我们将介绍如何使用 Transformers 库进行情感分析。如果预训练模型不能完全满足你的需求，你可以在自己的数据集上微调模型。# 加载预训练模型和分词器# 准备数据集"]# 定义训练参数# 定义 Trainer# 训练模型。
复制链接

扫一扫