1 前期准备
1.1 采集数据
采集有关南京博物院的有关数据,并进行中文分词(其中还需要用户自定义词典),其主要技术为:python爬虫、 jieba分词
内容如下:
1.2 构建语料库
根据已经采集好的数据构建语料库,其相关技术:MITIE工具
2 RASA_NLU
NLU模块的任务是:
- 意图识别 (Intent):在句子级别进行分类,明确意图;
- 实体识别(Entity):在词级别找出用户问题中的关键实体,进行实体槽填充(Slot Filling)。
2.1 进行rasa_nlu配置
配置文件如下:
language: "zh"
pipeline:
- name: "nlp_mitie"
model: "data/total_word_feature_extractor.dat"
- name: "tokenizer_jieba"
user_dicts: "./user.dict"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
2.2 准备训练数据
数据格式如下:
2.3 训练
代码如下:
from rasa_nlu.training_data import load_data
from rasa_nlu.config import RasaNLUModelConfig
from rasa_nlu.model import Trainer
from rasa_nlu import config
from rasa_core.agent import Agent
from rasa_core.policies.memoization import MemoizationPolicy
from rasa_core.interpreter import RasaNLUInterpreter
from rasa_core.policies.keras_policy import KerasPolicy
from rasa_core.channels.console import ConsoleInputChannel
# 训练模型
def train():
# 示例数据
training_data = load_data('data/museum.json')
# pipeline配置
trainer = Trainer(config.load("sample_configs/museum_config.json"))
trainer.train(training_data)
model_directory = trainer.persist('./models/demo/')
print(model_directory)
predict(model_directory)
#