目录
Configuring for incremental training
Combined Intent Classifiers and Entity Extractors#
Components
组件组成了您的NLU管道,并按顺序工作,将用户输入处理为结构化输出。有用于实体提取、意图分类、响应选择、预处理等的组件。
Language Models
The following components load pre-trained models that are needed if you want to use pre-trained word vectors in your pipeline.
MitieNLP#
-
Short
MITIE initializer
-
Outputs
Nothing
-
Requires
Nothing
-
Description
Initializes MITIE structures. Every MITIE component relies on this, hence this should be put at the beginning of every pipeline that uses any MITIE components.
-
Configuration
The MITIE library needs a language model file, that must be specified in the configuration:
pipeline:
- name: "MitieNLP"
# language model to load
model: "data/total_word_feature_extractor.dat"
You can also pre-train your own word vectors from a language corpus using MITIE. To do so:
-
Get a clean language corpus (a Wikipedia dump works) as a set of text files.
-
Build and run MITIE Wordrep Tool on your corpus. This can take several hours/days depending on your dataset and your workstation. You'll need something like 128GB of RAM for wordrep to run – yes, that's a lot: try to extend your swap.
-
Set the path of your new
total_word_feature_extractor.dat
as themodel
parameter to theMitieNLP
component in your configuration file.For a full example of how to train MITIE word vectors, check out 用Rasa NLU构建自己的中文NLU系统, a blogpost that goes through creating a MITIE model from a Chinese Wikipedia dump.
SpacyNLP#
-
Short
spaCy language initializer
-
Outputs
Nothing
-
Requires
Nothing
-
Description
Initializes spaCy structures. Every spaCy component relies on this, hence this should be put at the beginning of every pipeline that uses any spaCy components.
-
Configuration
You need to specify the language model to use. By default the language configured in the pipeline will be used as the language model name. If the spaCy model to be used has a name that is different from the language tag (
"en"
,"de"
, etc.), the model name can be specified using the configuration variablemodel
. The name will be passed tospacy.load(name)
.
pipeline:
- name: "SpacyNLP"
# language model to load
model: "en_core_web_md"
# when retrieving word vectors, this will decide if the casing
# of the word is relevant. E.g. `hello` and `Hello` will
# retrieve the same vector, if set to `False`. For some
# applications and models it makes sense to differentiate
# between these two words, therefore setting this to `True`.
case_sensitive: False
For more information on how to download the spaCy models, head over to installing SpaCy.
In addition to SpaCy's pretrained language models, you can also use this component to load fastText vectors, which are available for hundreds of languages. If you want to incorporate a custom model you've found into spaCy, check out their page on adding languages. As described in the documentation, you need to register your language model and link it to the language identifier, which will allow Rasa to load and use your new language by passing in your language identifier as the language
option.
HFTransformersNLP#
-
Short
HuggingFace's Transformers based pre-trained language model initializer
-
Outputs
Nothing
-
Requires
Nothing
-
Description
Initializes specified pre-trained language model from HuggingFace's Transformers library. The component applies language model specific tokenization and featurization to compute sequence and sentence level representations for each example in the training data. Include LanguageModelTokenizer and LanguageModelFeaturizer to utilize the output of this component for downstream NLU models.
- Configuration
- You should specify what language model to load via the parameter
model_name
. See the below table for the available language models. Additionally, you can also specify the architecture variation of the chosen language model by specifying the parametermodel_weights
. The full list of supported architectures can be found in the HuggingFace documentation. If left empty, it uses the default model architecture that original Transformers library loads (see table below).
+----------------+--------------+-------------------------+
| Language Model | Parameter | Default value for |
| | "model_name" | "model_weights" |
+----------------+--------------+-------------------------+
| BERT | bert | rasa/LaBSE |
+----------------+--------------+-------------------------+
| GPT | gpt | openai-gpt |
+----------------+--------------+-------------------------+
| GPT-2 | gpt2 | gpt2 |
+----------------+--------------+-------------------------+
| XLNet | xlnet | xlnet-base-cased |
+----------------+--------------+-------------------------+
| DistilBERT | distilbert | distilbert-base-uncased |
+----------------+--------------+-------------------------+
| RoBERTa | roberta | roberta-base |
+----------------+--------------+-------------------------+
The following configuration loads the language model BERT:
pipeline:
- name: HFTransformersNLP
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
# An optional path to a specific directory to download and cache the pre-trained model weights.
# The `default` cache_dir is the same as https://huggingface.co/transformers/serialization.html#cache-directory .
cache_dir: null
Tokenizers#
Tokenizers split text into tokens. If you want to split intents into multiple labels, e.g. for predicting multiple intents or for modeling hierarchical intent structure, use the following flags with any tokenizer:
-
intent_tokenization_flag
indicates whether to tokenize intent labels or not. Set it toTrue
, so that intent labels are tokenized. -
intent_split_symbol
sets the delimiter string to split the intent labels, default is underscore (_
).
WhitespaceTokenizer#
-
Short
Tokenizer using whitespaces as a separator
-
Outputs
tokens
for user messages, responses (if present), and intents (if specified)
-
Requires
Nothing
-
Description
Creates a token for every whitespace separated character sequence.
-
Configuration
pipeline:
- name: "WhitespaceTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
# Regular expression to detect tokens
"token_pattern": None
JiebaTokenizer#
-
Short
Tokenizer using Jieba for Chinese language
-
Outputs
tokens
for user messages, responses (if present), and intents (if specified)
-
Requires
Nothing
-
Description
Creates tokens using the Jieba tokenizer specifically for Chinese language. It will only work for the Chinese language.
- Configuration
- User's custom dictionary files can be auto loaded by specifying the files' directory path via
dictionary_path
. If thedictionary_path
isNone
(the default), then no custom dictionary will be used.
pipeline:
- name: "JiebaTokenizer"
dictionary_path: "path/to/custom/dictionary/dir"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
# Regular expression to detect tokens
"token_pattern": None
MitieTokenizer#
-
Short
Tokenizer using MITIE
-
Outputs
tokens
for user messages, responses (if present), and intents (if specified)
-
Requires
-
Description
Creates tokens using the MITIE tokenizer.
-
Configuration
pipeline:
- name: "MitieTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
# Regular expression to detect tokens
"token_pattern": None
SpacyTokenizer#
-
Short
Tokenizer using spaCy
-
Outputs
tokens
for user messages, responses (if present), and intents (if specified)
-
Requires
-
Description
Creates tokens using the spaCy tokenizer.
-
Configuration
pipeline:
- name: "SpacyTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
# Regular expression to detect tokens
"token_pattern": None
ConveRTTokenizer
-
Short
Tokenizer using ConveRT model.
-
Outputs
tokens
for user messages, responses (if present), and intents (if specified)
-
Requires
Nothing
-
Description
Creates tokens using the ConveRT tokenizer. Must be used whenever the ConveRTFeaturizer is used.
- Configuration
-
pipeline: - name: "ConveRTTokenizer" # Flag to check whether to split intents "intent_tokenization_flag": False # Symbol on which intent should be split "intent_split_symbol": "_" # Regular expression to detect tokens "token_pattern": None # Remote URL/Local directory of model files(Required) "model_url": None
- Short
Tokenizer from pre-trained language models
- Outputs
tokens
for user messages, responses (if present), and intents (if specified)
- Requires
- Description
Creates tokens using the pre-trained language model specified in upstream HFTransformersNLP component. Must be used whenever the LanguageModelFeaturizer is used.
- Configuration
pipeline:
- name: "LanguageModelTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
Featurizers#
Text featurizers are divided into two different categories: sparse featurizers and dense featurizers. Sparse featurizers are featurizers that return feature vectors with a lot of missing values, e.g. zeros. As those feature vectors would normally take up a lot of memory, we store them as sparse features. Sparse features only store the values that are non zero and their positions in the vector. Thus, we save a lot of memory and are able to train on larger datasets.
All featurizers can return two different kind of features: sequence features and sentence features. The sequence features are a matrix of size (number-of-tokens x feature-dimension)
. The matrix contains a feature vector for every token in the sequence. This allows us to train sequence models. The sentence features are represented by a matrix of size (1 x feature-dimension)
. It contains the feature vector for the complete utterance. The sentence features can be used in any bag-of-words model. The corresponding classifier can therefore decide what kind of features to use. Note: The feature-dimension
for sequence and sentence features does not have to be the same.
MitieFeaturizer#
-
Short
Creates a vector representation of user message and response (if specified) using the MITIE featurizer.
-
Outputs
dense_features
for user messages and responses
-
Requires
-
Type
Dense featurizer
-
Description
Creates features for entity extraction, intent classification, and response classification using the MITIE featurizer.
- Configuration
The sentence vector, i.e. the vector of the complete utterance, can be calculated in two different ways, either via mean or via max pooling. You can specify the pooling method in your configuration file with the option pooling
. The default pooling method is set to mean
.
pipeline:
- name: "MitieFeaturizer"
# Specify what pooling operation should be used to calculate the vector of
# the complete utterance. Available options: 'mean' and 'max'.
"pooling": "mean"
SpacyFeaturizer#
-
Short
Creates a vector representation of user message and response (if specified) using the spaCy featurizer.
-
Outputs
dense_features
for user messages and responses
-
Requires
-
Type
Dense featurizer
-
Description
Creates features for entity extraction, intent classification, and response classification using the spaCy featurizer.
-
Configuration
The sentence vector, i.e. the vector of the complete utterance, can be calculated in two different ways, either via mean or via max pooling. You can specify the pooling method in your configuration file with the option
pooling
. The default pooling method is set tomean
.
pipeline:
- name: "SpacyFeaturizer"
# Specify what pooling operation should be used to calculate the vector of
# the complete utterance. Available options: 'mean' and 'max'.
"pooling": "mean"
ConveRTFeaturizer#
-
Short
Creates a vector representation of user message and response (if specified) using ConveRT model.
-
Outputs
dense_features
for user messages and responses
-
Requires
tokens
-
Type
Dense featurizer
-
Description
Creates features for entity extraction, intent classification, and response selection. It uses the default signature to compute vector representations of input text.
Configuration
pipeline:
- name: "ConveRTFeaturizer"
LanguageModelFeaturizer#
-
Short
Creates a vector representation of user message and response (if specified) using a pre-trained language model.
-
Outputs
dense_features
for user messages and responses
-
Requires
tokens
.
-
Type
Dense featurizer
-
Description
Creates features for entity extraction, intent classification, and response selection. Uses a pre-trained language model to compute vector representations of input text.
- Configuration
Include a Tokenizer component before this component.
You should specify what language model to load via the parameter model_name
. See the below table for the available language models. Additionally, you can also specify the architecture variation of the chosen language model by specifying the parameter model_weights
. The full list of supported architectures can be found in the HuggingFace documentation. If left empty, it uses the default model architecture that original Transformers library loads (see table below).
+----------------+--------------+-------------------------+
| Language Model | Parameter | Default value for |
| | "model_name" | "model_weights" |
+----------------+--------------+-------------------------+
| BERT | bert | rasa/LaBSE |
+----------------+--------------+-------------------------+
| GPT | gpt | openai-gpt |
+----------------+--------------+-------------------------+
| GPT-2 | gpt2 | gpt2 |
+----------------+--------------+-------------------------+
| XLNet | xlnet | xlnet-base-cased |
+----------------+--------------+-------------------------+
| DistilBERT | distilbert | distilbert-base-uncased |
+----------------+--------------+-------------------------+
| RoBERTa | roberta | roberta-base |
+----------------+--------------+-------------------------+
The following configuration loads the language model BERT:
pipeline:
- name: LanguageModelFeaturizer
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
# An optional path to a specific directory to download and cache the pre-trained model weights.
# The `default` cache_dir is the same as https://huggingface.co/transformers/serialization.html#cache-directory .
cache_dir: null
RegexFeaturizer#
-
Short
Creates a vector representation of user message using regular expressions.
-
Outputs
sparse_features
for user messages andtokens.pattern
-
Requires
tokens
-
Type
Sparse featurizer
-
Description
Creates features for entity extraction and intent classification. During training the
RegexFeaturizer
creates a list of regular expressions defined in the training data format. For each regex, a feature will be set marking whether this expression was found in the user message or not. All features will later be fed into an intent classifier / entity extractor to simplify classification (assuming the classifier has learned during the training phase, that this set feature indicates a certain intent / entity). Regex features for entity extraction are currently only supported by the CRFEntityExtractor and the DIETClassifier components!
-
Configuration
Make the featurizer case insensitive by adding the
case_sensitive: False
option, the default beingcase_sensitive: True
.To correctly process languages such as Chinese that don't use whitespace for word separation, the user needs to add the
use_word_boundaries: False
option, the default beinguse_word_boundaries: True
.
pipeline:
- name: "RegexFeaturizer"
# Text will be processed with case sensitive as default
"case_sensitive": True
# use match word boundaries for lookup table
"use_word_boundaries": True
Configuring for incremental training
To ensure that sparse_features
are of fixed size during incremental training, the component should be configured to account for additional patterns that may be added to the training data in future. To do so, configure the number_additional_patterns
parameter while training the base model from scratch:
pipeline:
- name: RegexFeaturizer
number_additional_patterns: 10
If not configured by the user, the component will use twice the number of patterns currently present in the training data (including lookup tables and regex patterns) as the default value for number_additional_patterns
. This number is kept at a minimum of 10 in order to avoid running out of additional slots for new patterns too frequently during incremental training. Once the component runs out of additional pattern slots, the new patterns are dropped and not considered during featurization. At this point, it is advisable to retrain a new model from scratch.
CountVectorsFeaturizer#
-
Short
Creates bag-of-words representation of user messages, intents, and responses.
-
Outputs
sparse_features
for user messages, intents, and responses
-
Requires
tokens
-
Type
Sparse featurizer
-
Description
Creates features for intent classification and response selection. Creates bag-of-words representation of user message, intent, and response usingsklearn's CountVectorizer. All tokens which consist only of digits (e.g. 123 and 99 but not a123d) will be assigned to the same feature.
-
Configuration
See sklearn's CountVectorizer docs for detailed description of the configuration parameters.
This featurizer can be configured to use word or character n-grams, using the
analyzer
configuration parameter. By defaultanalyzer
is set toword
so word token counts are used as features. If you want to use character n-grams, setanalyzer
tochar
orchar_wb
. The lower and upper boundaries of the n-grams can be configured via the parametersmin_ngram
andmax_ngram
. By default both of them are set to1
. By default the featurizer takes the lemma of a word instead of the word directly if it is available. The lemma of a word is currently only set by the SpacyTokenizer. You can disable this behavior by settinguse_lemma
toFalse
.
Since the training is performed on limited vocabulary data, it cannot be guaranteed that during prediction an algorithm will not encounter an unknown word (a word that were not seen during training). In order to teach an algorithm how to treat unknown words, some words in training data can be substituted by generic word OOV_token
. In this case during prediction all unknown words will be treated as this generic word OOV_token
.
For example, one might create separate intent outofscope
in the training data containing messages of different number of OOV_token
s and maybe some additional general words. Then an algorithm will likely classify a message with unknown words as this intent outofscope
.
You can either set the OOV_token
or a list of words OOV_words
:
-
OOV_token
set a keyword for unseen words; if training data containsOOV_token
as words in some messages, during prediction the words that were not seen during training will be substituted with providedOOV_token
; ifOOV_token=None
(default behavior) words that were not seen during training will be ignored during prediction time; -
OOV_words
set a list of words to be treated asOOV_token
during training; if a list of words that should be treated as Out-Of-Vocabulary is known, it can be set toOOV_words
instead of manually changing it in training data or using custom preprocessor.
If you want to share the vocabulary between user messages and intents, you need to set the option use_shared_vocab
to True
. In that case a common vocabulary set between tokens in intents and user messages is build.
pipeline:
- name: "CountVectorsFeaturizer"
# Analyzer to use, either 'word', 'char', or 'char_wb'
"analyzer": "word"
# Set the lower and upper boundaries for the n-grams
"min_ngram": 1
"max_ngram": 1
# Set the out-of-vocabulary token
"OOV_token": "_oov_"
# Whether to use a shared vocab
"use_shared_vocab": False
Configuring for incremental training
To ensure that sparse_features
are of fixed size during incremental training, the component should be configured to account for additional vocabulary tokens that may be added as part of new training examples in the future. To do so, configure the additional_vocabulary_size
parameter while training the base model from scratch:
pipeline:
- name: CountVectorsFeaturizer
additional_vocabulary_size:
text: 1000
response: 1000
action_text: 1000
-
As in the above example, you can define additional vocabulary size for each of
text
(user messages),response
(bot responses used byResponseSelector
) andaction_text
(bot responses not used byResponseSelector
). If you are building a shared vocabulary (use_shared_vocab=True
), you only need to define a value for thetext
attribute. If any of the attribute is not configured by the user, the component takes half of the current vocabulary size as the default value for the attribute'sadditional_vocabulary_size
. This number is kept at a minimum of 1000 in order to avoid running out of additional vocabulary slots too frequently during incremental training. Once the component runs out of additional vocabulary slots, the new vocabulary tokens are dropped and not considered during featurization. At this point, it is advisable to retrain a new model from scratch.
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
LexicalSyntacticFeaturizer#
-
Short
Creates lexical and syntactic features for a user message to support entity extraction.
-
Outputs
sparse_features
for user messages
-
Requires
tokens
-
Type
Sparse featurizer
-
Description
Creates features for entity extraction. Moves with a sliding window over every token in the user message and creates features according to the configuration (see below). As a default configuration is present, you don't need to specify a configuration.
-
Configuration
You can configure what kind of lexical and syntactic features the featurizer should extract. The following features are available:
============== ==========================================================================================
Feature Name Description
============== ==========================================================================================
BOS Checks if the token is at the beginning of the sentence.
EOS Checks if the token is at the end of the sentence.
low Checks if the token is lower case.
upper Checks if the token is upper case.
title Checks if the token starts with an uppercase character and all remaining characters are
lowercased.
digit Checks if the token contains just digits.
prefix5 Take the first five characters of the token.
prefix2 Take the first two characters of the token.
suffix5 Take the last five characters of the token.
suffix3 Take the last three characters of the token.
suffix2 Take the last two characters of the token.
suffix1 Take the last character of the token.
pos Take the Part-of-Speech tag of the token (``SpacyTokenizer`` required).
pos2 Take the first two characters of the Part-of-Speech tag of the token
(``SpacyTokenizer`` required).
============== ==========================================================================================
As the featurizer is moving over the tokens in a user message with a sliding window, you can define features for previous tokens, the current token, and the next tokens in the sliding window. You define the features as a [before, token, after] array. If you want to define features for the token before, the current token, and the token after, your features configuration would look like this:
pipeline:
- name: LexicalSyntacticFeaturizer
"features": [
["low", "title", "upper"],
["BOS", "EOS", "low", "upper", "title", "digit"],
["low", "title", "upper"],
]
Intent Classifiers#
Intent classifiers assign one of the intents defined in the domain file to incoming user messages.
MitieIntentClassifier#
-
Short
MITIE intent classifier (using a text categorizer)
-
Outputs
intent
-
Requires
tokens
for user message and MitieNLP
-
Output-Example
{
"intent": {"name": "greet", "confidence": 0.98343}
}
Description
This classifier uses MITIE to perform intent classification. The underlying classifier is using a multi-class linear SVM with a sparse linear kernel (see train_text_categorizer_classifier
function at the MITIE trainer code).
- Configuration
pipeline:
- name: "MitieIntentClassifier"
SklearnIntentClassifier#
-
Short
Sklearn intent classifier
-
Outputs
intent
andintent_ranking
-
Requires
dense_features
for user messages
-
Output-Example
{
"intent": {"name": "greet", "confidence": 0.78343},
"intent_ranking": [
{
"confidence": 0.1485910906220309,
"name": "goodbye"
},
{
"confidence": 0.08161531595656784,
"name": "restaurant_search"
}
]
}
-
Description
The sklearn intent classifier trains a linear SVM which gets optimized using a grid search. It also provides rankings of the labels that did not “win”. The
SklearnIntentClassifier
needs to be preceded by a dense featurizer in the pipeline. This dense featurizer creates the features used for the classification. For more information about the algorithm itself, take a look at the GridSearchCV documentation.
-
Configuration
During the training of the SVM a hyperparameter search is run to find the best parameter set. In the configuration you can specify the parameters that will get tried.
pipeline:
- name: "SklearnIntentClassifier"
# Specifies the list of regularization values to
# cross-validate over for C-SVM.
# This is used with the ``kernel`` hyperparameter in GridSearchCV.
C: [1, 2, 5, 10, 20, 100]
# Specifies the kernel to use with C-SVM.
# This is used with the ``C`` hyperparameter in GridSearchCV.
kernels: ["linear"]
# Gamma parameter of the C-SVM.
"gamma": [0.1]
# We try to find a good number of cross folds to use during
# intent training, this specifies the max number of folds.
"max_cross_validation_folds": 5
# Scoring function used for evaluating the hyper parameters.
# This can be a name or a function.
"scoring_function": "f1_weighted"
KeywordIntentClassifier#
-
Short
Simple keyword matching intent classifier, intended for small, short-term projects.
-
Outputs
intent
-
Requires
Nothing
-
Output-Example
{
"intent": {"name": "greet", "confidence": 1.0}
}
- Description
This classifier works by searching a message for keywords. The matching is case sensitive by default and searches only for exact matches of the keyword-string in the user message. The keywords for an intent are the examples of that intent in the NLU training data. This means the entire example is the keyword, not the individual words in the example.
- Configuration
pipeline:
- name: "KeywordIntentClassifier"
case_sensitive: True
DIETClassifier#
-
Short
Dual Intent Entity Transformer (DIET) used for intent classification and entity extraction
-
Outputs
entities
,intent
andintent_ranking
-
Requires
dense_features
and/orsparse_features
for user message and optionally the intent
-
Output-Example
{
"intent": {"name": "greet", "confidence": 0.8343},
"intent_ranking": [
{
"confidence": 0.385910906220309,
"name": "goodbye"
},
{
"confidence": 0.28161531595656784,
"name": "restaurant_search"
}
],
"entities": [{
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DIETClassifier"
}]
}
- Description
DIET (Dual Intent and Entity Transformer) is a multi-task architecture for intent classification and entity recognition. The architecture is based on a transformer which is shared for both tasks. A sequence of entity labels is predicted through a Conditional Random Field (CRF) tagging layer on top of the transformer output sequence corresponding to the input sequence of tokens. For the intent labels the transformer output for the complete utterance and intent labels are embedded into a single semantic vector space. We use the dot-product loss to maximize the similarity with the target label and minimize similarities with negative samples.
If you want to learn more about the model, check out the Algorithm Whiteboard series on YouTube, where we explain the model architecture in detail.
-
Configuration
If you want to use the
DIETClassifier
just for intent classification, setentity_recognition
toFalse
. If you want to do only entity recognition, setintent_classification
toFalse
. By defaultDIETClassifier
does both, i.e.entity_recognition
andintent_classification
are set toTrue
.You can define a number of hyperparameters to adapt the model. If you want to adapt your model, start by modifying the following parameters:
-
epochs
: This parameter sets the number of times the algorithm will see the training data (default:300
). Oneepoch
is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained. -
hidden_layers_sizes
: This parameter allows you to define the number of feed forward layers and their output dimensions for user messages and intents (default:text: [], label: []
). Every entry in the list corresponds to a feed forward layer. For example, if you settext: [256, 128]
, we will add two feed forward layers in front of the transformer. The vectors of the input tokens (coming from the user message) will be passed on to those layers. The first layer will have an output dimension of 256 and the second layer will have an output dimension of 128. If an empty list is used (default behavior), no feed forward layer will be added. Make sure to use only positive integer values. Usually, numbers of power of two are used. Also, it is usual practice to have decreasing values in the list: next value is smaller or equal to the value before. -
embedding_dimension
: This parameter defines the output dimension of the embedding layers used inside the model (default:20
). We are using multiple embeddings layers inside the model architecture. For example, the vector of the complete utterance and the intent is passed on to an embedding layer before they are compared and the loss is calculated. -
number_of_transformer_layers
: This parameter sets the number of transformer layers to use (default:2
). The number of transformer layers corresponds to the transformer blocks to use for the model. -
transformer_size
: This parameter sets the number of units in the transformer (default:256
). The vectors coming out of the transformers will have the giventransformer_size
. -
weight_sparsity
: This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers in the model (default:0.8
). The value should be between 0 and 1. If you setweight_sparsity
to 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not setweight_sparsity
to 1 as this would result in all kernel weights being 0, i.e. the model is not able to learn.
-
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
FallbackClassifier#
-
Short
Classifies a message with the intent
nlu_fallback
if the NLU intent classification scores are ambiguous. The confidence is set to1 - top intent confidence
. -
Outputs
entities
,intent
andintent_ranking
-
Requires
intent
andintent_ranking
output from a previous intent classifier -
Output-Example
{
"intent": {"name": "nlu_fallback", "confidence": 0.7183846840434321},
"intent_ranking": [
{
"confidence": 0.7183846840434321,
"name": "nlu_fallback"
},
{
"confidence": 0.28161531595656784,
"name": "restaurant_search"
}
],
"entities": [{
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DIETClassifier"
}]
}
- Description
The FallbackClassifier
classifies a user message with the intent nlu_fallback
in case the previous intent classifier wasn't able to classify an intent with a confidence greater or equal than the threshold
of the FallbackClassifier
. It can also predict the fallback intent in the case when the confidence scores of the two top ranked intents are closer than the the ambiguity_threshold
.
You can use the FallbackClassifier
to implement a Fallback Actionwhich handles message with uncertain NLU predictions.
rules:
- rule: Ask the user to rephrase in case of low NLU confidence
steps:
- intent: nlu_fallback
- action: utter_please_rephrase
- Configuration
The FallbackClassifier
will only add its prediction for the nlu_fallback
intent in case no other intent was predicted with a confidence greater or equal than threshold
.
threshold
: This parameter sets the threshold for predicting thenlu_fallback
intent. If no intent predicted by a previous intent classifier has a confidence level greater or equal thanthreshold
theFallbackClassifier
will add a prediction of thenlu_fallback
intent with a confidence1.0
.ambiguity_threshold
: If you configure anambiguity_threshold
, theFallbackClassifier
will also predict thenlu_fallback
intent in case the difference of the confidence scores for the two highest ranked intents is smaller than theambiguity_threshold
.
Entity Extractors#
Entity extractors extract entities, such as person names or locations, from the user message.
MitieEntityExtractor#
-
Short
MITIE entity extraction (using a MITIE NER trainer)
-
Outputs
entities
-
Requires
MitieNLP and
tokens
-
Output-Example
{
"entities": [{
"value": "New York City",
"start": 20,
"end": 33,
"confidence": null,
"entity": "city",
"extractor": "MitieEntityExtractor"
}]
}
- Description
MitieEntityExtractor
uses the MITIE entity extraction to find entities in a message. The underlying classifier is using a multi class linear SVM with a sparse linear kernel and custom features. The MITIE component does not provide entity confidence values.
- Configuration
pipeline:
- name: "MitieEntityExtractor"
SpacyEntityExtractor#
-
Short
spaCy entity extraction
-
Outputs
entities
-
Requires
-
Output-Example
{
"entities": [{
"value": "New York City",
"start": 20,
"end": 33,
"confidence": null,
"entity": "city",
"extractor": "SpacyEntityExtractor"
}]
}
- Description
Using spaCy this component predicts the entities of a message. spaCy uses a statistical BILOU transition model. As of now, this component can only use the spaCy builtin entity extraction models and can not be retrained. This extractor does not provide any confidence scores.
You can test out spaCy's entity extraction models in this interactive demo. Note that some spaCy models are highly case-sensitive.
- Configuration
Configure which dimensions, i.e. entity types, the spaCy component should extract. A full list of available dimensions can be found in the spaCy documentation. Leaving the dimensions option unspecified will extract all available dimensions.
pipeline:
- name: "SpacyEntityExtractor"
# dimensions to extract
dimensions: ["PERSON", "LOC", "ORG", "PRODUCT"]
CRFEntityExtractor#
-
Short
Conditional random field (CRF) entity extraction
-
Outputs
entities
-
Requires
tokens
anddense_features
(optional)
-
Output-Example
{
"entities": [{
"value": "New York City",
"start": 20,
"end": 33,
"entity": "city",
"confidence": 0.874,
"extractor": "CRFEntityExtractor"
}]
}
- Description
This component implements a conditional random fields (CRF) to do named entity recognition. CRFs can be thought of as an undirected Markov chain where the time steps are words and the states are entity classes. Features of the words (capitalization, POS tagging, etc.) give probabilities to certain entity classes, as are transitions between neighbouring entity tags: the most likely set of tags is then calculated and returned.
If you want to pass custom features, such as pre-trained word embeddings, to CRFEntityExtractor
, you can add any dense featurizer to the pipeline before the CRFEntityExtractor
. CRFEntityExtractor
automatically finds the additional dense features and checks if the dense features are an iterable of len(tokens)
, where each entry is a vector. A warning will be shown in case the check fails. However, CRFEntityExtractor
will continue to train just without the additional custom features. In case dense features are present, CRFEntityExtractor
will pass the dense features to sklearn_crfsuite
and use them for training.
-
Configuration
CRFEntityExtractor
has a list of default features to use. However, you can overwrite the default configuration. The following features are available:
============== ==========================================================================================
Feature Name Description
============== ==========================================================================================
low Checks if the token is lower case.
upper Checks if the token is upper case.
title Checks if the token starts with an uppercase character and all remaining characters are
lowercased.
digit Checks if the token contains just digits.
prefix5 Take the first five characters of the token.
prefix2 Take the first two characters of the token.
suffix5 Take the last five characters of the token.
suffix3 Take the last three characters of the token.
suffix2 Take the last two characters of the token.
suffix1 Take the last character of the token.
pos Take the Part-of-Speech tag of the token (``SpacyTokenizer`` required).
pos2 Take the first two characters of the Part-of-Speech tag of the token
(``SpacyTokenizer`` required).
pattern Take the patterns defined by ``RegexFeaturizer``.
bias Add an additional "bias" feature to the list of features.
============== ==========================================================================================
As the featurizer is moving over the tokens in a user message with a sliding window, you can define features for previous tokens, the current token, and the next tokens in the sliding window. You define the features as [before, token, after] array.
Additional you can set a flag to determine whether to use the BILOU tagging schema or not.
BILOU_flag
determines whether to use BILOU tagging or not. DefaultTrue
.
pipeline:
- name: "CRFEntityExtractor"
# BILOU_flag determines whether to use BILOU tagging or not.
"BILOU_flag": True
# features to extract in the sliding window
"features": [
["low", "title", "upper"],
[
"bias",
"low",
"prefix5",
"prefix2",
"suffix5",
"suffix3",
"suffix2",
"upper",
"title",
"digit",
"pattern",
],
["low", "title", "upper"],
]
# The maximum number of iterations for optimization algorithms.
"max_iterations": 50
# weight of the L1 regularization
"L1_c": 0.1
# weight of the L2 regularization
"L2_c": 0.1
# Name of dense featurizers to use.
# If list is empty all available dense features are used.
"featurizers": []
# Indicated whether a list of extracted entities should be split into individual entities for a given entity type
"split_entities_by_comma":
address: False
email: True
DucklingHTTPExtractor#
-
Short
Duckling lets you extract common entities like dates, amounts of money, distances, and others in a number of languages.
-
Outputs
entities
-
Requires
Nothing
-
Output-Example
{
"entities": [{
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DucklingHTTPExtractor"
}]
}
- Description
To use this component you need to run a duckling server. The easiest option is to spin up a docker container usingdocker run -p 8000:8000 rasa/duckling
.
Alternatively, you can install duckling directly on your machine and start the server.
Duckling allows to recognize dates, numbers, distances and other structured entities and normalizes them. Please be aware that duckling tries to extract as many entity types as possible without providing a ranking. For example, if you specify both number
and time
as dimensions for the duckling component, the component will extract two entities: 10
as a number and in 10 minutes
as a time from the text I will be there in 10 minutes
. In such a situation, your application would have to decide which entity type is be the correct one. The extractor will always return 1.0 as a confidence, as it is a rule based system.
The list of supported languages can be found in the Duckling GitHub repository.
- Configuration
Configure which dimensions, i.e. entity types, the duckling component should extract. A full list of available dimensions can be found in the duckling documentation. Leaving the dimensions option unspecified will extract all available dimensions.
pipeline:
- name: "DucklingHTTPExtractor"
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions: ["time", "number", "amount-of-money", "distance"]
# allows you to configure the locale, by default the language is
# used
locale: "de_DE"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "Europe/Berlin"
# Timeout for receiving response from http url of the running duckling server
# if not set the default timeout of duckling http url is set to 3 seconds.
timeout : 3
DIETClassifier#
-
Short
Dual Intent Entity Transformer (DIET) used for intent classification and entity extraction
-
Description
You can find the detailed description of the DIETClassifier under the section Intent Classifiers.
RegexEntityExtractor#
-
Short
Extracts entities using the lookup tables and/or regexes defined in the training data
-
Outputs
entities
-
Requires
Nothing
-
Description
This component extract entities using the lookup tables and regexesdefined in the training data. The component checks if the user message contains an entry of one of the lookup tables or matches one of the regexes. If a match is found, the value is extracted as entity.
This component only uses those regex features that have a name equal to one of the entities defined in the training data. Make sure to annotate at least one example per entity.
-
Configuration
Make the entity extractor case sensitive by adding the case_sensitive: True
option, the default beingcase_sensitive: False
.
To correctly process languages such as Chinese that don't use whitespace for word separation, the user needs to add the use_word_boundaries: False
option, the default being use_word_boundaries: True
.
pipeline:
- name: RegexEntityExtractor
# text will be processed with case insensitive as default
case_sensitive: False
# use lookup tables to extract entities
use_lookup_tables: True
# use regexes to extract entities
use_regexes: True
# use match word boundaries for lookup table
"use_word_boundaries": True
EntitySynonymMapper#
-
Short
Maps synonymous entity values to the same value.
-
Outputs
Modifies existing entities that previous entity extraction components found.
-
Requires
An extractor from Entity Extractors
-
Description
If the training data contains defined synonyms, this component will make sure that detected entity values will be mapped to the same value. For example, if your training data contains the following examples:
[
{
"text": "I moved to New York City",
"intent": "inform_relocation",
"entities": [{
"value": "nyc",
"start": 11,
"end": 24,
"entity": "city",
}]
},
{
"text": "I got a new flat in NYC.",
"intent": "inform_relocation",
"entities": [{
"value": "nyc",
"start": 20,
"end": 23,
"entity": "city",
}]
}
]
This component will allow you to map the entities New York City
and NYC
to nyc
. The entity extraction will return nyc
even though the message contains NYC
. When this component changes an existing entity, it appends itself to the processor list of this entity.
- Configuration
pipeline:
- name: "EntitySynonymMapper"
Combined Intent Classifiers and Entity Extractors#
DIETClassifier#
-
Short
Dual Intent Entity Transformer (DIET) used for intent classification and entity extraction
-
Outputs
entities
,intent
andintent_ranking
-
Requires
dense_features
and/orsparse_features
for user message and optionally the intent
-
Output-Example
{
"intent": {"name": "greet", "confidence": 0.8343},
"intent_ranking": [
{
"confidence": 0.385910906220309,
"name": "goodbye"
},
{
"confidence": 0.28161531595656784,
"name": "restaurant_search"
}
],
"entities": [{
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DIETClassifier"
}]
}
- Description
DIET (Dual Intent and Entity Transformer) is a multi-task architecture for intent classification and entity recognition. The architecture is based on a transformer which is shared for both tasks. A sequence of entity labels is predicted through a Conditional Random Field (CRF) tagging layer on top of the transformer output sequence corresponding to the input sequence of tokens. For the intent labels the transformer output for the complete utterance and intent labels are embedded into a single semantic vector space. We use the dot-product loss to maximize the similarity with the target label and minimize similarities with negative samples.
If you want to learn more about the model, check out the Algorithm Whiteboard series on YouTube, where we explain the model architecture in detail.
Configuration
If you want to use the DIETClassifier
just for intent classification, set entity_recognition
to False
. If you want to do only entity recognition, set intent_classification
to False
. By default DIETClassifier
does both, i.e. entity_recognition
and intent_classification
are set toTrue
.
You can define a number of hyperparameters to adapt the model. If you want to adapt your model, start by modifying the following parameters:
-
epochs
: This parameter sets the number of times the algorithm will see the training data (default:300
). Oneepoch
is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained. -
hidden_layers_sizes
: This parameter allows you to define the number of feed forward layers and their output dimensions for user messages and intents (default:text: [], label: []
). Every entry in the list corresponds to a feed forward layer. For example, if you settext: [256, 128]
, we will add two feed forward layers in front of the transformer. The vectors of the input tokens (coming from the user message) will be passed on to those layers. The first layer will have an output dimension of 256 and the second layer will have an output dimension of 128. If an empty list is used (default behavior), no feed forward layer will be added. Make sure to use only positive integer values. Usually, numbers of power of two are used. Also, it is usual practice to have decreasing values in the list: next value is smaller or equal to the value before. -
embedding_dimension
: This parameter defines the output dimension of the embedding layers used inside the model (default:20
). We are using multiple embeddings layers inside the model architecture. For example, the vector of the complete utterance and the intent is passed on to an embedding layer before they are compared and the loss is calculated. -
number_of_transformer_layers
: This parameter sets the number of transformer layers to use (default:2
). The number of transformer layers corresponds to the transformer blocks to use for the model. -
transformer_size
: This parameter sets the number of units in the transformer (default:256
). The vectors coming out of the transformers will have the giventransformer_size
. -
weight_sparsity
: This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers in the model (default:0.8
). The value should be between 0 and 1. If you setweight_sparsity
to 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not setweight_sparsity
to 1 as this would result in all kernel weights being 0, i.e. the model is not able to learn.
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
Selectors#
Selectors predict a bot response from a set of candidate responses.
ResponseSelector#
-
Short
Response Selector
-
Outputs
A dictionary with the key as the retrieval intent of the response selector and value containing predicted response templates, confidence and the response key under the retrieval intent
-
Requires
dense_features
and/orsparse_features
for user messages and response
-
Output-Example
The parsed output from NLU will have a property named
response_selector
containing the output for each response selector component. Each response selector is identified byretrieval_intent
parameter of that response selector and stores two properties:-
response
: The predicted response key under the corresponding retrieval intent, prediction's confidence and the associated response templates. -
ranking
: Ranking with confidences of top 10 candidate response keys.
Example result:
-
{
"response_selector": {
"faq": {
"response": {
"id": 1388783286124361986,
"confidence": 0.7,
"intent_response_key": "chitchat/ask_weather",
"response_templates": [
{
"text": "It's sunny in Berlin today",
"image": "https://i.imgur.com/nGF1K8f.jpg"
},
{
"text": "I think it's about to rain."
}
],
"template_name": "utter_chitchat/ask_weather"
},
"ranking": [
{
"id": 1388783286124361986,
"confidence": 0.7,
"intent_response_key": "chitchat/ask_weather"
},
{
"id": 1388783286124361986,
"confidence": 0.3,
"intent_response_key": "chitchat/ask_name"
}
]
}
}
}
If the retrieval_intent
parameter of a particular response selector was left to its default value, the corresponding response selector will be identified as default
in the returned output.
{
"response_selector": {
"default": {
"response": {
"id": 1388783286124361986,
"confidence": 0.7,
"intent_response_key": "chitchat/ask_weather",
"response_templates": [
{
"text": "It's sunny in Berlin today",
"image": "https://i.imgur.com/nGF1K8f.jpg"
},
{
"text": "I think it's about to rain."
}
],
"template_name": "utter_chitchat/ask_weather"
},
"ranking": [
{
"id": 1388783286124361986,
"confidence": 0.7,
"intent_response_key": "chitchat/ask_weather"
},
{
"id": 1388783286124361986,
"confidence": 0.3,
"intent_response_key": "chitchat/ask_name"
}
]
}
}
}
- Description
Response Selector component can be used to build a response retrieval model to directly predict a bot response from a set of candidate responses. The prediction of this model is used by the dialogue manager to utter the predicted responses. It embeds user inputs and response labels into the same space and follows the exact same neural network architecture and optimization as the DIETClassifier.
To use this component, your training data should contain retrieval intents. To define these, checkout documentation on NLU training examples and documentation on defining response utterances for retrieval intents.
- Configuration
The algorithm includes almost all the hyperparameters that DIETClassifieruses. If you want to adapt your model, start by modifying the following parameters:
-
epochs
: This parameter sets the number of times the algorithm will see the training data (default:300
). Oneepoch
is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained. -
hidden_layers_sizes
: This parameter allows you to define the number of feed forward layers and their output dimensions for user messages and intents (default:text: [256, 128], label: [256, 128]
). Every entry in the list corresponds to a feed forward layer. For example, if you settext: [256, 128]
, we will add two feed forward layers in front of the transformer. The vectors of the input tokens (coming from the user message) will be passed on to those layers. The first layer will have an output dimension of 256 and the second layer will have an output dimension of 128. If an empty list is used (default behavior), no feed forward layer will be added. Make sure to use only positive integer values. Usually, numbers of power of two are used. Also, it is usual practice to have decreasing values in the list: next value is smaller or equal to the value before. -
embedding_dimension
: This parameter defines the output dimension of the embedding layers used inside the model (default:20
). We are using multiple embeddings layers inside the model architecture. For example, the vector of the complete utterance and the intent is passed on to an embedding layer before they are compared and the loss is calculated. -
number_of_transformer_layers
: This parameter sets the number of transformer layers to use (default:0
). The number of transformer layers corresponds to the transformer blocks to use for the model. -
transformer_size
: This parameter sets the number of units in the transformer (default:None
). The vectors coming out of the transformers will have the giventransformer_size
. -
weight_sparsity
: This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers in the model (default:0.8
). The value should be between 0 and 1. If you setweight_sparsity
to 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not setweight_sparsity
to 1 as this would result in all kernel weights being 0, i.e. the model is not able to learn.
The component can also be configured to train a response selector for a particular retrieval intent. The parameter retrieval_intent
sets the name of the retrieval intent for which this response selector model is trained. Default is None
, i.e. the model is trained for all retrieval intents.
In its default configuration, the component uses the retrieval intent with the response key(e.g. faq/ask_name
) as the label for training. Alternatively, it can also be configured to use the text of the response templates as the training label by switching use_text_as_label
to True
. In this mode, the component will use the first available response template which has a text attribute for training. If none are found, it falls back to using the retrieval intent combined with the response key as the label.
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
Custom Components#
You can create a custom component to perform a specific task which NLU doesn't currently offer (for example, sentiment analysis). Below is the specification of the rasa.nlu.components.Component
] class with the methods you'll need to implement.
You can add a custom component to your pipeline by adding the module path. So if you have a module called sentiment
containing a SentimentAnalyzer
class:
pipeline:
- name: "sentiment.SentimentAnalyzer"
Also be sure to read the section on the Component Lifecycle.
To get started, you can use this skeleton that contains the most important methods that you should implement:
import typing
from typing import Any, Optional, Text, Dict, List, Type
from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message
if typing.TYPE_CHECKING:
from rasa.nlu.model import Metadata
class MyComponent(Component):
"""A new component"""
# Which components are required by this component.
# Listed components should appear before the component itself in the pipeline.
@classmethod
def required_components(cls) -> List[Type[Component]]:
"""Specify which components need to be present in the pipeline."""
return []
# Defines the default configuration parameters of a component
# these values can be overwritten in the pipeline configuration
# of the model. The component should choose sensible defaults
# and should be able to create reasonable results with the defaults.
defaults = {}
# Defines what language(s) this component can handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
supported_language_list = None
# Defines what language(s) this component can NOT handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
not_supported_language_list = None
def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
super().__init__(component_config)
def train(
self,
training_data: TrainingData,
config: Optional[RasaNLUModelConfig] = None,
**kwargs: Any,
) -> None:
"""Train this component.
This is the components chance to train itself provided
with the training data. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.train`
of components previous to this one."""
pass
def process(self, message: Message, **kwargs: Any) -> None:
"""Process an incoming message.
This is the components chance to process an incoming
message. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.process`
of components previous to this one."""
pass
def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
"""Persist this component to disk for future loading."""
pass
@classmethod
def load(
cls,
meta: Dict[Text, Any],
model_dir: Optional[Text] = None,
model_metadata: Optional["Metadata"] = None,
cached_component: Optional["Component"] = None,
**kwargs: Any,
) -> "Component":
"""Load this component from file."""
if cached_component:
return cached_component
else:
return cls(meta)
When you define metadata for your intent examples in your training data, your component can access both the intent metadata and the intent example metadata during processing:
# in your component class
def process(self, message: Message, **kwargs: Any) -> None:
metadata = message.get("metadata")
print(metadata.get("intent"))
print(metadata.get("example"))