LanguageModelTokenizer组件已被弃用,部分原因是它无法处理非空白标记化的语言,如中文。可以使用JiebaTokenizer代替。
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
# - name: HFTransformersNLP
- name: JiebaTokenizer
- name: RegexFeaturizer
"use_word_boundaries": True
- name: LexicalSyntacticFeaturizer
# - name: LanguageModelTokenizer
- name: LanguageModelFeaturizer
model_weights: "bert-base-chinese"
model_name: "bert"
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 150
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1