rasa 中文 UnsupportedLanguageError: component ‘LanguageModelTokenizer‘ does not support language ‘zh‘.

最新推荐文章于 2025-04-14 16:53:47 发布

我对算法一无所知

最新推荐文章于 2025-04-14 16:53:47 发布

阅读量1.6k

点赞数

分类专栏：各种error总结文章标签： chatbot rasa

本文链接：https://blog.csdn.net/qq_31267769/article/details/117124890

版权

各种error总结专栏收录该内容

11 篇文章

订阅专栏

LanguageModelTokenizer组件已被弃用，部分原因是它无法处理非空白标记化的语言，如中文。可以使用JiebaTokenizer代替。

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh

pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
  # - name: HFTransformersNLP
    
  - name: JiebaTokenizer
  
  - name: RegexFeaturizer
    "use_word_boundaries": True
  
  - name: LexicalSyntacticFeaturizer
  
  # - name: LanguageModelTokenizer

  - name: LanguageModelFeaturizer
    model_weights: "bert-base-chinese"
    model_name: "bert"

  - name: CountVectorsFeaturizer

  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4

  - name: DIETClassifier
    epochs: 150
    constrain_similarities: true

  - name: EntitySynonymMapper

  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true

  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1