背景
原始项目依赖RASA 1.10.7,截止2021年7月23日,RASA已经更新到2.8.x,且2.8是3.0版本前的最后一个大版本,考虑到3.0相对于2.8的更新集中在Tokenizer的删除和Graph相关功能的新增,详见相关issue,而基础数据格式、Action、Policy等基本组件的改动不会像1.0升级到2.0版本那么大(也许),也为了降低后期跨大版本迁移的工作成本,遂计划先将项目升级到2.8版本。
环境
Python 3.7,RASA 2.8.1
Config
Pipeline
Rasa 1.0
- name: HFTransformersNLP
model_name: bert
model_weights: hfl/chinese_roberta_wwm_ext
cache_dir: hfl/chinese_roberta_wwm_ext
- name: LanguageModelTokenizer
- name: LanguageModelFeaturizer
Rasa 2.0
- name: JiebaTokenizer
- name: LanguageModelFeaturizer
model_name: bert
model_weights: hfl/chinese_roberta_wwm_ext
cache_dir: hfl/chinese_roberta_wwm_ext
训练时报错
UserWarning: Misaligned entity annotation in message '跟踪目标流程' with intent 'okr_follow'. Make sure the start and end values of entities ([(2, 4, '目标')]) in the training data match the token boundaries ([(0, 4, '跟踪目标'), (4, 6, '流程')]). Common causes:
1) entities include trailing whitespaces or punctuation
2) the tokenizer gives an unexpected result, due to languages such as Chinese that don't use whitespace for word separation
Rules
Rasa 1.0(单轮story)
## chitchat_greet
* chitchat_greet
- utter_chitchat_greet
Rasa 2.0
- rule: chitchat_greet
steps:
- intent: chitchat_greet
- action: utter_chitchat_greet
Domain
responses
Rasa 1.0
responses:
utter_chitchat_goodbye:
- custom: {"data": [{"type": "text", "text": "再见"}]}
- custom: {"data": [{"type": "text", "text": "拜拜"}]}
Rasa 2.0
responses:
utter_chitchat_goodbye:
- custom:
data:
- type: text
text: 再见
- custom:
data:
- type: text
text: 拜拜
报错
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [85,13] vs. [85,12]
[[{{node cond/else/_13/cond/add_1}}]]
[[crf/cond/else/_1/crf/cond/Cast/_272]]
(1) Invalid argument: Incompatible shapes: [85,13] vs. [85,12]
[[{{node cond/else/_13/cond/add_1}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_40826]