<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          (六)RASA NLU意圖分類器

          共 19701字,需瀏覽 40分鐘

           ·

          2021-09-11 20:13


          作者簡介




          原文:https://zhuanlan.zhihu.com/p/333309670

          轉(zhuǎn)載者:楊夕

          面筋地址:https://github.com/km1994/NLP-Interview-Notes

          個人筆記:https://github.com/km1994/nlp_paper_study


                             


          RASA的邏輯是根據(jù)用戶本輪說話的意圖做分類,然后結(jié)合歷史上下文,給出一個action。意圖分類是后續(xù)策略選擇的基礎(chǔ)。

          RASA支持的意圖分類器有:

          MitieIntentClassifier

          使用MitieNLP的分類器,需要Tokenizer都使用MitieNLP,但是MitieIntentClassifier分類器里面已經(jīng)自帶Featurizer功能,所以不是必須配置的。簡單來說,是基于稀疏線性核的一個多分類線性SVM。具體算法參考:

          MITE : https://github.com/mit-nlp/MITIEhttps://github.com/mit-nlp/MITIE

          SklearnIntentClassifier

          使用Sklearn去做意圖識別。sklearn也是通過SVM做意圖識別,只是sklearn的SVM是通過grid search方法優(yōu)化的,關(guān)于Grid Search參考

          https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

          SklearnIntentClassifier使用時候需要將SVM的超參數(shù)配置上。具體配置如下:

          pipeline:- name: "SklearnIntentClassifier" # Specifies the list of regularization values to # cross-validate over for C-SVM. # This is used with the ``kernel`` hyperparameter in GridSearchCV. C: [1, 2, 5, 10, 20, 100] # Specifies the kernel to use with C-SVM. # This is used with the ``C`` hyperparameter in GridSearchCV. kernels: ["linear"] # Gamma parameter of the C-SVM. "gamma": [0.1] # We try to find a good number of cross folds to use during # intent training, this specifies the max number of folds. "max_cross_validation_folds": 5 # Scoring function used for evaluating the hyper parameters. # This can be a name or a function. "scoring_function": "f1_weighted"

          KeywordIntentClassifier

          簡單的關(guān)鍵字匹配意圖分類,適用于小型項(xiàng)目,意圖比較少的情況。當(dāng)意圖很多,相關(guān)性又很大的時候,關(guān)鍵詞分類器無法區(qū)分。

          關(guān)鍵字的匹配方式是,訓(xùn)練數(shù)據(jù)的整句話都作為關(guān)鍵字,去搜索用戶說的話。因此寫配置數(shù)據(jù)的時候,仔細(xì)設(shè)計(jì)那個訓(xùn)練數(shù)據(jù)很重要,關(guān)鍵字不能太長,這容易匹配不上意圖,也不能太短,缺少意圖的區(qū)分度。

          DIETClassifier

          DIET模型是Dual Intent and Entity Transformer的簡稱, 解決了對話理解問題中的2個問題,意圖分類和實(shí)體識別。DIET使用的是純監(jiān)督的方式,沒有任何預(yù)訓(xùn)練的情況下,無須大規(guī)模預(yù)訓(xùn)練是關(guān)鍵,性能好于fine-tuning Bert, 但是訓(xùn)練速度是bert的6倍。輸入是用戶消息和可選意圖的稠密或者稀疏向量。輸出是實(shí)體,意圖和評分。

          DIET體系結(jié)構(gòu)基于兩個任務(wù)共享的Transformer。實(shí)體標(biāo)簽序列通過Transformer后,輸出序列進(jìn)入頂層條件隨機(jī)場(CRF)標(biāo)記層預(yù)測,輸出每個Token成為BIOE的概率。完整話語和意圖標(biāo)簽經(jīng)過Transformer輸出到單個語義向量空間中。利用點(diǎn)積損失最大化與目標(biāo)標(biāo)簽的相似度,最小化與負(fù)樣本的相似度。具體DIET的算法參考:

          DIET:Dual Intent and Entity Transformer——RASA論文翻譯: https://zhuanlan.zhihu.com/p/337181983

          如果只想將DIETClassifier用于意圖分類,請將entity_recognition設(shè)置為False。如果只想進(jìn)行實(shí)體識別,請將intent_classification設(shè)置為False。默認(rèn)情況下,DIETClassifier同時執(zhí)行這兩項(xiàng)操作,即實(shí)體識別和意圖分類都設(shè)置為True。

          可以定義多個超參數(shù)來調(diào)整模型。如果要調(diào)整模型,請首先修改以下參數(shù):

          epochs:此參數(shù)設(shè)置算法將看到訓(xùn)練數(shù)據(jù)的次數(shù)(默認(rèn)值:300)。一個epoch等于所有訓(xùn)練實(shí)例的一個向前傳播和一個向后傳播。有時模型需要更多的epoch來正確學(xué)習(xí)。epoch數(shù)越少,模型的訓(xùn)練速度就越快。

          hidden_layers_sizes:此參數(shù)允許您為用戶消息和意圖定義前饋層的數(shù)量及其輸出維度(默認(rèn)值:文本:[],標(biāo)簽:[])。列表中的每個條目都對應(yīng)一個前饋層。例如,如果設(shè)置text:[256,128],我們將在轉(zhuǎn)換器前面添加兩個前饋層。輸入token的向量(來自用戶消息)將被傳遞到這些層。第一層的輸出維度為256,第二層的輸出維度為128。如果使用空列表(默認(rèn)行為),則不會添加前饋層。確保只使用正整數(shù)值。通常使用二次冪的數(shù)字,第二個值小于或等于前一個值。

          embedding_dimension:該參數(shù)定義模型內(nèi)部使用的嵌入層的輸出維度(默認(rèn)值:20)。我們在模型架構(gòu)中使用了多個嵌入層。例如,在比較和計(jì)算損失之前,將完整的話語和意圖的向量傳遞到嵌入層。

          number_of_transformer_layers:此參數(shù)設(shè)置要使用的transformer層數(shù)(默認(rèn)值:2)。transformer層的數(shù)量對應(yīng)于要用于模型的transformer塊。

          transformer_size:此參數(shù)設(shè)置transformer中的單位數(shù)(默認(rèn)值:256)。來自transformer的矢量將具有給定的transformer_size。

          weight_sparsity:該參數(shù)定義模型中所有前饋層的內(nèi)核權(quán)重的分?jǐn)?shù)(默認(rèn)值:0.8)。該值應(yīng)介于0和1之間。如果將weight_sparsity設(shè)置為0,則不會將內(nèi)核權(quán)重設(shè)置為0,該層將充當(dāng)標(biāo)準(zhǔn)的前饋層。您不應(yīng)該將weight_sparsity設(shè)置為1,因?yàn)檫@將導(dǎo)致所有內(nèi)核權(quán)重為0,即模型無法學(xué)習(xí)。

          一般來說,調(diào)整這些參數(shù)就可以獲得比較好的模型。另外還有其他可以調(diào)整的參數(shù),具體見下表。

          +---------------------------------+------------------+--------------------------------------------------------------+
          | Parameter | Default Value | Description |
          +=================================+==================+==============================================================+
          | hidden_layers_sizes | text: [] | Hidden layer sizes for layers before the embedding layers |
          | | label: [] | for user messages and labels. The number of hidden layers is |
          | | | equal to the length of the corresponding list. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | share_hidden_layers | False | Whether to share the hidden layer weights between user |
          | | | messages and labels. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | transformer_size | 256 | Number of units in transformer. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | number_of_transformer_layers | 2 | Number of transformer layers. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | number_of_attention_heads | 4 | Number of attention heads in transformer. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_key_relative_attention | False | If 'True' use key relative embeddings in attention. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_value_relative_attention | False | If 'True' use value relative embeddings in attention. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | max_relative_position | None | Maximum position for relative embeddings. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | unidirectional_encoder | False | Use a unidirectional or bidirectional encoder. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | batch_size | [64, 256] | Initial and final value for batch sizes. |
          | | | Batch size will be linearly increased for each epoch. |
          | | | If constant `batch_size` is required, pass an int, e.g. `8`. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | batch_strategy | "balanced" | Strategy used when creating batches. |
          | | | Can be either 'sequence' or 'balanced'. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | epochs | 300 | Number of epochs to train. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | random_seed | None | Set random seed to any 'int' to get reproducible results. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | learning_rate | 0.001 | Initial learning rate for the optimizer. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | embedding_dimension | 20 | Dimension size of embedding vectors. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | dense_dimension | text: 128 | Dense dimension for sparse features to use. |
          | | label: 20 | |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | concat_dimension | text: 128 | Concat dimension for sequence and sentence features. |
          | | label: 20 | |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | number_of_negative_examples | 20 | The number of incorrect labels. The algorithm will minimize |
          | | | their similarity to the user input during training. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
          | | | or 'inner'. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | ranking_length | 10 | Number of top actions to normalize scores for loss type |
          | | | 'softmax'. Set to 0 to turn off normalization. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
          | | | embedding vectors for correct labels. |
          | | | Should be 0.0 < ... < 1.0 for 'cosine' similarity type. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | maximum_negative_similarity | -0.4 | Maximum negative similarity for incorrect labels. |
          | | | Should be -1.0 < ... < 1.0 for 'cosine' similarity type. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_maximum_negative_similarity | True | If 'True' the algorithm only minimizes maximum similarity |
          | | | over incorrect intent labels, used only if 'loss_type' is |
          | | | set to 'margin'. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | scale_loss | False | Scale loss inverse proportionally to confidence of correct |
          | | | prediction. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | regularization_constant | 0.002 | The scale of regularization. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | negative_margin_scale | 0.8 | The scale of how important it is to minimize the maximum |
          | | | similarity between embeddings of different labels. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | weight_sparsity | 0.8 | Sparsity of the weights in dense layers. |
          | | | Value should be between 0 and 1. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | drop_rate | 0.2 | Dropout rate for encoder. Value should be between 0 and 1. |
          | | | The higher the value the higher the regularization effect. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | drop_rate_attention | 0.0 | Dropout rate for attention. Value should be between 0 and 1. |
          | | | The higher the value the higher the regularization effect. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_sparse_input_dropout | True | If 'True' apply dropout to sparse input tensors. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_dense_input_dropout | True | If 'True' apply dropout to dense input tensors. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | evaluate_every_number_of_epochs | 20 | How often to calculate validation accuracy. |
          | | | Set to '-1' to evaluate just once at the end of training. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | evaluate_on_number_of_examples | 0 | How many examples to use for hold out validation set. |
          | | | Large values may hurt performance, e.g. model accuracy. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | intent_classification | True | If 'True' intent classification is trained and intents are |
          | | | predicted. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | entity_recognition | True | If 'True' entity recognition is trained and entities are |
          | | | extracted. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | use_masked_language_model | False | If 'True' random tokens of the input message will be masked |
          | | | and the model has to predict those tokens. It acts like a |
          | | | regularizer and should help to learn a better contextual |
          | | | representation of the input. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | tensorboard_log_directory | None | If you want to use tensorboard to visualize training |
          | | | metrics, set this option to a valid output directory. You |
          | | | can view the training metrics after training in tensorboard |
          | | | via 'tensorboard --logdir <path-to-given-directory>'. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | tensorboard_log_level | "epoch" | Define when training metrics for tensorboard should be |
          | | | logged. Either after every epoch ('epoch') or for every |
          | | | training step ('minibatch'). |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | featurizers | [] | List of featurizer names (alias names). Only features |
          | | | coming from the listed names are used. If list is empty |
          | | | all available features are used. |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | checkpoint_model | False | Save the best performing model during training. Models are |
          | | | stored to the location specified by `--out`. Only the one |
          | | | best model will be saved. |
          | | | Requires `evaluate_on_number_of_examples > 0` and |
          | | | `evaluate_every_number_of_epochs > 0` |
          +---------------------------------+------------------+--------------------------------------------------------------+
          | split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each |
          | | | one of them as a single entity. Can either be `True`/`False` |
          | | | globally, or set per entity type, such as: |
          | | | ``` |
          | | | ... |
          | | | - name: DIETClassifier |
          | | | split_entities_by_comma: |
          | | | address: True |
          | | | ... |
          | | | ... |
          | | | ``` |
          +---------------------------------+------------------+--------------------------------------------------------------+

          FallbackClassifier

          當(dāng)意圖識別的得分比較低時,使用該分類器決定是否給出nlu_fallback意圖。注意,這個FallbackClassifier總是跟在其他意圖分類器之后,對前一個意圖分類提給出的意圖及置信度進(jìn)行判定。如果前一個意圖分類器給出的意圖預(yù)測置信度低于threshold,或者兩個排名最高的意圖的置信度得分接近時,FallbackClassifier實(shí)施回退操作。

          回退意圖的應(yīng)答,可以通過規(guī)則來實(shí)現(xiàn)。

          rules:
          - rule: Ask the user to rephrase in case of low NLU confidence
          steps:
          - intent: nlu_fallback
          - action: utter_please_rephrase

          FallbackClassifier的配置參數(shù)有:

          threshold:此參數(shù)設(shè)置預(yù)測nlu_fallback意圖的閾值。如果前一個意圖分類器預(yù)測的意圖置信度小于threshold,則FallbackClassifier將返回一個置信度為1.0的nlu_fallback意圖。

          ambiguity_threshold:如果兩個排名最高的意圖的置信度得分之差小于ambiguity_threshold,F(xiàn)allbackClassifier將返回一個置信度為1.0的nlu_fallback意圖。


          瀏覽 130
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評論
          圖片
          表情
          推薦
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  国产福利视频在线播放 | 天天做爱视频 | 91豆花视频18 | 日本久操视频 | 午夜操庇 |