Rasa自定义NLU组件

最新推荐文章于 2023-05-19 19:30:00 发布

XerCis

最新推荐文章于 2023-05-19 19:30:00 发布

阅读量2.3k

点赞数 5

分类专栏： Python 文章标签： python rasa

本文链接：https://blog.csdn.net/lly1122334/article/details/106119731

版权

Python 专栏收录该内容

524 篇文章 195 订阅

订阅专栏

文章目录

问题描述
Rasa NLU pipeline介绍
步骤
备注
参考文献

问题描述

想用自定义组件（如情感分析、拼写检查、字符分词器等）加强Rasa现有NLU模型

Rasa NLU pipeline介绍

pipeline定义了输入到输出经过哪些处理，如：

在这里插入图片描述

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"

每个组件会一个接一个调用，并产生输出，这些输出要么直接作为最终输出，要么作为其他组件的输入

在这里插入图片描述

步骤

以下添加【情感分析组件】为例：

安装自然语言处理库spaCy

pip install spacy
python -m spacy download en_core_web_sm

创建项目：rasa init --no-prompt
nlu.md顶部添加数据

## intent: feedback
- It’s very helpful
- I had the best experience speaking with you
- no feedback
- ok
- You are the most stupid bot I have ever seen
- the worst

添加标签labels.txt

pos
pos
neu
neu
neg
neg

构建情感分析组件sentiment.py。实现继承Component类的方法，具体有训练train()、解析process()、持久化persist()、加载load()

import pickle
from typing import Any, Text, Dict
from rasa.nlu.components import Component
from nltk.classify import NaiveBayesClassifier

SENTIMENT_MODEL_FILE_NAME = "sentiment_classifier.pkl"


class SentimentAnalyzer(Component):
    """自定义情感分析组件"""
    name = "sentiment"
    provides = ["entities"]
    requires = ["tokens"]
    defaults = {}
    language_list = ["en"]

    def __init__(self, component_config=None):
        super(SentimentAnalyzer, self).__init__(component_config)

    def train(self, training_data, cfg, **kwargs):
        """从文本文件中加载情感标签，检索训练分词并格式化，形成情感分类器"""
        with open("labels.txt", "r") as f:
            labels = f.read().splitlines()
        training_data = training_data.training_examples  # list of Message objects
        tokens = [list(map(lambda x: x.text, t.get("tokens"))) for t in training_data]
        processed_tokens = [self.preprocessing(t) for t in tokens]
        labeled_data = [(t, x) for t, x in zip(processed_tokens, labels)]
        self.clf = NaiveBayesClassifier.train(labeled_data)

    def convert_to_rasa(self, value, confidence):
        """将模型输出转换为Rasa NLU的输出格式"""
        entity = {"value": value,
                  "confidence": confidence,
                  "entity": "sentiment",
                  "extractor": "sentiment_extractor"}
        return entity

    def preprocessing(self, tokens):
        """创建训练示例的词袋表示"""
        return ({word: True for word in tokens})

    def process(self, message, **kwargs):
        """检索新消息的分词，并将其传给分类器，将预测结果追加到message中"""
        if not self.clf:
            print("No training!")
        else:
            tokens = [t.text for t in message.get("tokens")]
            tb = self.preprocessing(tokens)
            pred = self.clf.prob_classify(tb)
            sentiment = pred.max()
            confidence = pred.prob(sentiment)
            entity = self.convert_to_rasa(sentiment, confidence)
            message.set("entities", [entity], add_to_output=True)

    def persist(self, file_name, model_dir):
        """将整个类持久化"""
        classifier_file = SENTIMENT_MODEL_FILE_NAME
        with open(classifier_file, "wb") as f:
            pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)
        return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

    @classmethod
    def load(cls, meta: Dict[Text, Any], model_dir=None, model_metadata=None, cached_component=None, **kwargs):
        file_name = meta.get("classifier_file")
        classifier_file = file_name
        with open(classifier_file, "rb") as f:
            return pickle.load(f)

修改config.yml，添加自定义组件，格式为模块名.类名

language: en
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "sentiment.SentimentAnalyzer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"

当祖安用户发来亲切的问候Hello stupid bot

"entities": [
    {
      "value": "neg",
      "confidence": 0.8181818181818182,
      "entity": "sentiment",
      "extractor": "sentiment_extractor"
    }
  ]