作者簡介

原文：https://zhuanlan.zhihu.com/p/333309670

轉(zhuǎn)載者：楊夕

面筋地址：https://github.com/km1994/NLP-Interview-Notes

個人筆記：https://github.com/km1994/nlp_paper_study

RASA的邏輯是根據(jù)用戶本輪說話的意圖做分類，然后結(jié)合歷史上下文，給出一個action。意圖分類是后續(xù)策略選擇的基礎(chǔ)。

RASA支持的意圖分類器有：

MitieIntentClassifier

使用MitieNLP的分類器，需要Tokenizer都使用MitieNLP，但是MitieIntentClassifier分類器里面已經(jīng)自帶Featurizer功能，所以不是必須配置的。簡單來說，是基于稀疏線性核的一個多分類線性SVM。具體算法參考：

MITE : https://github.com/mit-nlp/MITIEhttps://github.com/mit-nlp/MITIE

SklearnIntentClassifier

使用Sklearn去做意圖識別。sklearn也是通過SVM做意圖識別，只是sklearn的SVM是通過grid search方法優(yōu)化的，關(guān)于Grid Search參考

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

SklearnIntentClassifier使用時候需要將SVM的超參數(shù)配置上。具體配置如下：

pipeline:- name: "SklearnIntentClassifier" # Specifies the list of regularization values to # cross-validate over for C-SVM. # This is used with the ``kernel`` hyperparameter in GridSearchCV. C: [1, 2, 5, 10, 20, 100] # Specifies the kernel to use with C-SVM. # This is used with the ``C`` hyperparameter in GridSearchCV. kernels: ["linear"] # Gamma parameter of the C-SVM. "gamma": [0.1] # We try to find a good number of cross folds to use during # intent training, this specifies the max number of folds. "max_cross_validation_folds": 5 # Scoring function used for evaluating the hyper parameters. # This can be a name or a function. "scoring_function": "f1_weighted"

KeywordIntentClassifier

簡單的關(guān)鍵字匹配意圖分類，適用于小型項(xiàng)目，意圖比較少的情況。當(dāng)意圖很多，相關(guān)性又很大的時候，關(guān)鍵詞分類器無法區(qū)分。

關(guān)鍵字的匹配方式是，訓(xùn)練數(shù)據(jù)的整句話都作為關(guān)鍵字，去搜索用戶說的話。因此寫配置數(shù)據(jù)的時候，仔細(xì)設(shè)計(jì)那個訓(xùn)練數(shù)據(jù)很重要，關(guān)鍵字不能太長，這容易匹配不上意圖，也不能太短，缺少意圖的區(qū)分度。

DIETClassifier

DIET模型是Dual Intent and Entity Transformer的簡稱, 解決了對話理解問題中的2個問題，意圖分類和實(shí)體識別。DIET使用的是純監(jiān)督的方式，沒有任何預(yù)訓(xùn)練的情況下，無須大規(guī)模預(yù)訓(xùn)練是關(guān)鍵，性能好于fine-tuning Bert, 但是訓(xùn)練速度是bert的6倍。輸入是用戶消息和可選意圖的稠密或者稀疏向量。輸出是實(shí)體，意圖和評分。

DIET體系結(jié)構(gòu)基于兩個任務(wù)共享的Transformer。實(shí)體標(biāo)簽序列通過Transformer后，輸出序列進(jìn)入頂層條件隨機(jī)場（CRF）標(biāo)記層預(yù)測，輸出每個Token成為BIOE的概率。完整話語和意圖標(biāo)簽經(jīng)過Transformer輸出到單個語義向量空間中。利用點(diǎn)積損失最大化與目標(biāo)標(biāo)簽的相似度，最小化與負(fù)樣本的相似度。具體DIET的算法參考：

DIET：Dual Intent and Entity Transformer——RASA論文翻譯: https://zhuanlan.zhihu.com/p/337181983

如果只想將DIETClassifier用于意圖分類，請將entity_recognition設(shè)置為False。如果只想進(jìn)行實(shí)體識別，請將intent_classification設(shè)置為False。默認(rèn)情況下，DIETClassifier同時執(zhí)行這兩項(xiàng)操作，即實(shí)體識別和意圖分類都設(shè)置為True。

可以定義多個超參數(shù)來調(diào)整模型。如果要調(diào)整模型，請首先修改以下參數(shù)：

epochs：此參數(shù)設(shè)置算法將看到訓(xùn)練數(shù)據(jù)的次數(shù)（默認(rèn)值：300）。一個epoch等于所有訓(xùn)練實(shí)例的一個向前傳播和一個向后傳播。有時模型需要更多的epoch來正確學(xué)習(xí)。epoch數(shù)越少，模型的訓(xùn)練速度就越快。

hidden_layers_sizes：此參數(shù)允許您為用戶消息和意圖定義前饋層的數(shù)量及其輸出維度（默認(rèn)值：文本：[]，標(biāo)簽：[]）。列表中的每個條目都對應(yīng)一個前饋層。例如，如果設(shè)置text:[256，128]，我們將在轉(zhuǎn)換器前面添加兩個前饋層。輸入token的向量（來自用戶消息）將被傳遞到這些層。第一層的輸出維度為256，第二層的輸出維度為128。如果使用空列表（默認(rèn)行為），則不會添加前饋層。確保只使用正整數(shù)值。通常使用二次冪的數(shù)字，第二個值小于或等于前一個值。

embedding_dimension：該參數(shù)定義模型內(nèi)部使用的嵌入層的輸出維度（默認(rèn)值：20）。我們在模型架構(gòu)中使用了多個嵌入層。例如，在比較和計(jì)算損失之前，將完整的話語和意圖的向量傳遞到嵌入層。

number_of_transformer_layers：此參數(shù)設(shè)置要使用的transformer層數(shù)（默認(rèn)值：2）。transformer層的數(shù)量對應(yīng)于要用于模型的transformer塊。

transformer_size：此參數(shù)設(shè)置transformer中的單位數(shù)（默認(rèn)值：256）。來自transformer的矢量將具有給定的transformer_size。

weight_sparsity：該參數(shù)定義模型中所有前饋層的內(nèi)核權(quán)重的分?jǐn)?shù)（默認(rèn)值：0.8）。該值應(yīng)介于0和1之間。如果將weight_sparsity設(shè)置為0，則不會將內(nèi)核權(quán)重設(shè)置為0，該層將充當(dāng)標(biāo)準(zhǔn)的前饋層。您不應(yīng)該將weight_sparsity設(shè)置為1，因?yàn)檫@將導(dǎo)致所有內(nèi)核權(quán)重為0，即模型無法學(xué)習(xí)。

一般來說，調(diào)整這些參數(shù)就可以獲得比較好的模型。另外還有其他可以調(diào)整的參數(shù)，具體見下表。

+---------------------------------+------------------+--------------------------------------------------------------+
| Parameter                       | Default Value    | Description                                                  |
+=================================+==================+==============================================================+
| hidden_layers_sizes             | text: []         | Hidden layer sizes for layers before the embedding layers    |
|                                 | label: []        | for user messages and labels. The number of hidden layers is |
|                                 |                  | equal to the length of the corresponding list.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| share_hidden_layers             | False            | Whether to share the hidden layer weights between user       |
|                                 |                  | messages and labels.                                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| transformer_size                | 256              | Number of units in transformer.                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_transformer_layers    | 2                | Number of transformer layers.                                |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_attention_heads       | 4                | Number of attention heads in transformer.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_key_relative_attention      | False            | If 'True' use key relative embeddings in attention.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_value_relative_attention    | False            | If 'True' use value relative embeddings in attention.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| max_relative_position           | None             | Maximum position for relative embeddings.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| unidirectional_encoder          | False            | Use a unidirectional or bidirectional encoder.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_size                      | [64, 256]        | Initial and final value for batch sizes.                     |
|                                 |                  | Batch size will be linearly increased for each epoch.        |
|                                 |                  | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_strategy                  | "balanced"       | Strategy used when creating batches.                         |
|                                 |                  | Can be either 'sequence' or 'balanced'.                      |
+---------------------------------+------------------+--------------------------------------------------------------+
| epochs                          | 300              | Number of epochs to train.                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| random_seed                     | None             | Set random seed to any 'int' to get reproducible results.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| learning_rate                   | 0.001            | Initial learning rate for the optimizer.                     |
+---------------------------------+------------------+--------------------------------------------------------------+
| embedding_dimension             | 20               | Dimension size of embedding vectors.                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| dense_dimension                 | text: 128        | Dense dimension for sparse features to use.                  |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| concat_dimension                | text: 128        | Concat dimension for sequence and sentence features.         |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_negative_examples     | 20               | The number of incorrect labels. The algorithm will minimize  |
|                                 |                  | their similarity to the user input during training.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| similarity_type                 | "auto"           | Type of similarity measure to use, either 'auto' or 'cosine' |
|                                 |                  | or 'inner'.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type                       | "softmax"        | The type of the loss function, either 'softmax' or 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length                  | 10               | Number of top actions to normalize scores for loss type      |
|                                 |                  | 'softmax'. Set to 0 to turn off normalization.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity     | 0.8              | Indicates how similar the algorithm should try to make       |
|                                 |                  | embedding vectors for correct labels.                        |
|                                 |                  | Should be 0.0 < ... < 1.0 for 'cosine' similarity type.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_negative_similarity     | -0.4             | Maximum negative similarity for incorrect labels.            |
|                                 |                  | Should be -1.0 < ... < 1.0 for 'cosine' similarity type.     |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity | True             | If 'True' the algorithm only minimizes maximum similarity    |
|                                 |                  | over incorrect intent labels, used only if 'loss_type' is    |
|                                 |                  | set to 'margin'.                                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| scale_loss                      | False            | Scale loss inverse proportionally to confidence of correct   |
|                                 |                  | prediction.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| regularization_constant         | 0.002            | The scale of regularization.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| negative_margin_scale           | 0.8              | The scale of how important it is to minimize the maximum     |
|                                 |                  | similarity between embeddings of different labels.           |
+---------------------------------+------------------+--------------------------------------------------------------+
| weight_sparsity                 | 0.8              | Sparsity of the weights in dense layers.                     |
|                                 |                  | Value should be between 0 and 1.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate                       | 0.2              | Dropout rate for encoder. Value should be between 0 and 1.   |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate_attention             | 0.0              | Dropout rate for attention. Value should be between 0 and 1. |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_sparse_input_dropout        | True             | If 'True' apply dropout to sparse input tensors.             |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_dense_input_dropout         | True             | If 'True' apply dropout to dense input tensors.              |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs | 20               | How often to calculate validation accuracy.                  |
|                                 |                  | Set to '-1' to evaluate just once at the end of training.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples  | 0                | How many examples to use for hold out validation set.        |
|                                 |                  | Large values may hurt performance, e.g. model accuracy.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| intent_classification           | True             | If 'True' intent classification is trained and intents are   |
|                                 |                  | predicted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| entity_recognition              | True             | If 'True' entity recognition is trained and entities are     |
|                                 |                  | extracted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_masked_language_model       | False            | If 'True' random tokens of the input message will be masked  |
|                                 |                  | and the model has to predict those tokens. It acts like a    |
|                                 |                  | regularizer and should help to learn a better contextual     |
|                                 |                  | representation of the input.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_directory       | None             | If you want to use tensorboard to visualize training         |
|                                 |                  | metrics, set this option to a valid output directory. You    |
|                                 |                  | can view the training metrics after training in tensorboard  |
|                                 |                  | via 'tensorboard --logdir <path-to-given-directory>'.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_level           | "epoch"          | Define when training metrics for tensorboard should be       |
|                                 |                  | logged. Either after every epoch ('epoch') or for every      |
|                                 |                  | training step ('minibatch').                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| featurizers                     | []               | List of featurizer names (alias names). Only features        |
|                                 |                  | coming from the listed names are used. If list is empty      |
|                                 |                  | all available features are used.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| checkpoint_model                | False            | Save the best performing model during training. Models are   |
|                                 |                  | stored to the location specified by `--out`. Only the one    |
|                                 |                  | best model will be saved.                                    |
|                                 |                  | Requires `evaluate_on_number_of_examples > 0` and            |
|                                 |                  | `evaluate_every_number_of_epochs > 0`                        |
+---------------------------------+------------------+--------------------------------------------------------------+
| split_entities_by_comma         | True             | Splits a list of extracted entities by comma to treat each   |
|                                 |                  | one of them as a single entity. Can either be `True`/`False` |
|                                 |                  | globally, or set per entity type, such as:                   |
|                                 |                  | ```                                                          |
|                                 |                  | ...                                                          |
|                                 |                  | - name: DIETClassifier                                       |
|                                 |                  |   split_entities_by_comma:                                   |
|                                 |                  |     address: True                                            |
|                                 |                  |     ...                                                      |
|                                 |                  | ...                                                          |
|                                 |                  | ```                                                          |
+---------------------------------+------------------+--------------------------------------------------------------+

FallbackClassifier

當(dāng)意圖識別的得分比較低時，使用該分類器決定是否給出nlu_fallback意圖。注意，這個FallbackClassifier總是跟在其他意圖分類器之后，對前一個意圖分類提給出的意圖及置信度進(jìn)行判定。如果前一個意圖分類器給出的意圖預(yù)測置信度低于threshold，或者兩個排名最高的意圖的置信度得分接近時，FallbackClassifier實(shí)施回退操作。

回退意圖的應(yīng)答，可以通過規(guī)則來實(shí)現(xiàn)。

rules:
- rule: Ask the user to rephrase in case of low NLU confidence
  steps:
  - intent: nlu_fallback
  - action: utter_please_rephrase

FallbackClassifier的配置參數(shù)有：

threshold：此參數(shù)設(shè)置預(yù)測nlu_fallback意圖的閾值。如果前一個意圖分類器預(yù)測的意圖置信度小于threshold，則FallbackClassifier將返回一個置信度為1.0的nlu_fallback意圖。

ambiguity_threshold：如果兩個排名最高的意圖的置信度得分之差小于ambiguity_threshold，F(xiàn)allbackClassifier將返回一個置信度為1.0的nlu_fallback意圖。

（六）RASA NLU意圖分類器

DIET：Dual Intent and Entity Transformer——RASA論文翻譯: https://zhuanlan.zhihu.com/p/337181983