序列標(biāo)注算法評估模塊 seqeval 的使用
在NLP中,序列標(biāo)注算法是常見的深度學(xué)習(xí)模型,但是,對于序列標(biāo)注算法的評估,我們真的熟悉嗎?
在本文中,筆者將會序列標(biāo)注算法的模型效果評估方法和seqeval的使用。
序列標(biāo)注算法的模型效果評估
在序列標(biāo)注算法中,一般我們會形成如下的序列列表,如下:
['O',?'O',?'B-MISC',?'I-MISC',?'B-MISC',?'I-MISC',?'O',?'B-PER',?'I-PER']
一般序列標(biāo)注算法的格式有BIO,IOBES,BMES等。其中,實體指的是從B開頭標(biāo)簽開始的,同一類型(比如:PER/LOC/ORG)的,非O的連續(xù)標(biāo)簽序列。
常見的序列標(biāo)注算法的模型效果評估指標(biāo)有準(zhǔn)確率(accuracy)、查準(zhǔn)率(percision)、召回率(recall)、F1值等,計算的公式如下:
準(zhǔn)確率: accuracy = 預(yù)測對的元素個數(shù)/總的元素個數(shù)
查準(zhǔn)率:precision = 預(yù)測正確的實體個數(shù) / 預(yù)測的實體總個數(shù)
召回率:recall = 預(yù)測正確的實體個數(shù) / 標(biāo)注的實體總個數(shù)
F1值:F1 = 2 *準(zhǔn)確率 * 召回率 / (準(zhǔn)確率 + 召回率)
舉個例子,我們有如下的真實序列y_true和預(yù)測序列y_pred,如下:
y_true?=?['O',?'O',?'O',?'B-MISC',?'I-MISC',?'I-MISC',?'O',?'B-PER',?'I-PER']
y_pred?=?['O',?'O',?'B-MISC',?'I-MISC',?'B-MISC',?'I-MISC',?'O',?'B-PER',?'I-PER']
列表中一個有9個元素,其中預(yù)測對的元素個數(shù)為6個,那么準(zhǔn)確率為2/3。標(biāo)注的實體總個數(shù)為2個,預(yù)測的實體總個數(shù)為3個,預(yù)測正確的實體個數(shù)為1個,那么precision=1/3, recall=1/2, F1=0.4。
seqeval的使用
一般我們的序列標(biāo)注算法,是用conlleval.pl腳本實現(xiàn),但這是用perl語言實現(xiàn)的。在Python中,也有相應(yīng)的序列標(biāo)注算法的模型效果評估的第三方模塊,那就是seqeval,其官網(wǎng)網(wǎng)址為:https://pypi.org/project/seqeval/0.0.3/ 。
seqeval支持BIO,IOBES標(biāo)注模式,可用于命名實體識別,詞性標(biāo)注,語義角色標(biāo)注等任務(wù)的評估。
官網(wǎng)文檔中給出了兩個例子,筆者修改如下:
例子1:
#?-*-?coding:?utf-8?-*-
from?seqeval.metrics?import?f1_score
from?seqeval.metrics?import?precision_score
from?seqeval.metrics?import?accuracy_score
from?seqeval.metrics?import?recall_score
from?seqeval.metrics?import?classification_report
y_true?=?['O',?'O',?'O',?'B-MISC',?'I-MISC',?'I-MISC',?'O',?'B-PER',?'I-PER']
y_pred?=?['O',?'O',?'B-MISC',?'I-MISC',?'B-MISC',?'I-MISC',?'O',?'B-PER',?'I-PER']
print("accuary:?",?accuracy_score(y_true,?y_pred))
print("p:?",?precision_score(y_true,?y_pred))
print("r:?",?recall_score(y_true,?y_pred))
print("f1:?",?f1_score(y_true,?y_pred))
print("classification?report:?")
print(classification_report(y_true,?y_pred))
輸出結(jié)果如下:
accuary:??0.6666666666666666
p:??0.3333333333333333
r:??0.5
f1:??0.4
classification?report:?
???????????precision????recall??f1-score???support
?????MISC???????0.00??????0.00??????0.00?????????1
??????PER???????1.00??????1.00??????1.00?????????1
micro?avg???????0.33??????0.50??????0.40?????????2
macro?avg???????0.50??????0.50??????0.50?????????2
例子2:
#?-*-?coding:?utf-8?-*-
from?seqeval.metrics?import?f1_score
from?seqeval.metrics?import?precision_score
from?seqeval.metrics?import?accuracy_score
from?seqeval.metrics?import?recall_score
from?seqeval.metrics?import?classification_report
y_true?=?[['O',?'O',?'O',?'B-MISC',?'I-MISC',?'I-MISC',?'O'],?['B-PER',?'I-PER']]
y_pred?=??[['O',?'O',?'B-MISC',?'I-MISC',?'B-MISC',?'I-MISC',?'O'],?['B-PER',?'I-PER']]
print("accuary:?",?accuracy_score(y_true,?y_pred))
print("p:?",?precision_score(y_true,?y_pred))
print("r:?",?recall_score(y_true,?y_pred))
print("f1:?",?f1_score(y_true,?y_pred))
print("classification?report:?")
print(classification_report(y_true,?y_pred))
輸出結(jié)果同上。
在Keras中使用seqeval
筆者一年多年寫過文章:使用CRF++實現(xiàn)命名實體識別(NER), 我們對模型訓(xùn)練部分的代碼加以改造,使之在訓(xùn)練過程中能輸出F1值。
在Github上下載項目DL_4_NER,網(wǎng)址為:https://github.com/percent4/DL_4_NER 。修改utils.py中的文件夾路徑,以及模型訓(xùn)練部分的代碼(DL_4_NER/Bi_LSTM_Model_training.py)如下:
#?-*-?coding:?utf-8?-*-
import?pickle
import?numpy?as?np
import?pandas?as?pd
from?utils?import?BASE_DIR,?CONSTANTS,?load_data
from?data_processing?import?data_processing
from?keras.utils?import?np_utils,?plot_model
from?keras.models?import?Sequential
from?keras.preprocessing.sequence?import?pad_sequences
from?keras.layers?import?Bidirectional,?LSTM,?Dense,?Embedding,?TimeDistributed
#?模型輸入數(shù)據(jù)
def?input_data_for_model(input_shape):
????#?數(shù)據(jù)導(dǎo)入
????input_data?=?load_data()
????#?數(shù)據(jù)處理
????data_processing()
????#?導(dǎo)入字典
????with?open(CONSTANTS[1],?'rb')?as?f:
????????word_dictionary?=?pickle.load(f)
????with?open(CONSTANTS[2],?'rb')?as?f:
????????inverse_word_dictionary?=?pickle.load(f)
????with?open(CONSTANTS[3],?'rb')?as?f:
????????label_dictionary?=?pickle.load(f)
????with?open(CONSTANTS[4],?'rb')?as?f:
????????output_dictionary?=?pickle.load(f)
????vocab_size?=?len(word_dictionary.keys())
????label_size?=?len(label_dictionary.keys())
????#?處理輸入數(shù)據(jù)
????aggregate_function?=?lambda?input:?[(word,?pos,?label)?for?word,?pos,?label?in
????????????????????????????????????????????zip(input['word'].values.tolist(),
????????????????????????????????????????????????input['pos'].values.tolist(),
????????????????????????????????????????????????input['tag'].values.tolist())]
????grouped_input_data?=?input_data.groupby('sent_no').apply(aggregate_function)
????sentences?=?[sentence?for?sentence?in?grouped_input_data]
????x?=?[[word_dictionary[word[0]]?for?word?in?sent]?for?sent?in?sentences]
????x?=?pad_sequences(maxlen=input_shape,?sequences=x,?padding='post',?value=0)
????y?=?[[label_dictionary[word[2]]?for?word?in?sent]?for?sent?in?sentences]
????y?=?pad_sequences(maxlen=input_shape,?sequences=y,?padding='post',?value=0)
????y?=?[np_utils.to_categorical(label,?num_classes=label_size?+?1)?for?label?in?y]
????return?x,?y,?output_dictionary,?vocab_size,?label_size,?inverse_word_dictionary
#?定義深度學(xué)習(xí)模型:Bi-LSTM
def?create_Bi_LSTM(vocab_size,?label_size,?input_shape,?output_dim,?n_units,?out_act,?activation):
????model?=?Sequential()
????model.add(Embedding(input_dim=vocab_size?+?1,?output_dim=output_dim,
????????????????????????input_length=input_shape,?mask_zero=True))
????model.add(Bidirectional(LSTM(units=n_units,?activation=activation,
?????????????????????????????????return_sequences=True)))
????model.add(TimeDistributed(Dense(label_size?+?1,?activation=out_act)))
????model.compile(optimizer='adam',?loss='categorical_crossentropy',?metrics=['accuracy'])
????return?model
#?模型訓(xùn)練
def?model_train():
????#?將數(shù)據(jù)集分為訓(xùn)練集和測試集,占比為9:1
????input_shape?=?60
????x,?y,?output_dictionary,?vocab_size,?label_size,?inverse_word_dictionary?=?input_data_for_model(input_shape)
????train_end?=?int(len(x)*0.9)
????train_x,?train_y?=?x[0:train_end],?np.array(y[0:train_end])
????test_x,?test_y?=?x[train_end:],?np.array(y[train_end:])
????#?模型輸入?yún)?shù)
????activation?=?'selu'
????out_act?=?'softmax'
????n_units?=?100
????batch_size?=?32
????epochs?=?10
????output_dim?=?20
????#?模型訓(xùn)練
????lstm_model?=?create_Bi_LSTM(vocab_size,?label_size,?input_shape,?output_dim,?n_units,?out_act,?activation)
????lstm_model.fit(train_x,?train_y,?validation_data=(test_x,?test_y),?epochs=epochs,?batch_size=batch_size,?verbose=1)
model_train()
模型訓(xùn)練的結(jié)果如下(中間過程省略):
......
12598/12598?[==============================]?-?26s?2ms/step?-?loss:?0.0075?-?acc:?0.9981?-?val_loss:?0.2131?-?val_acc:?0.9592
我們修改代碼,在lstm_model.fit那一行修改代碼如下:
????lables?=?['O',?'B-MISC',?'I-MISC',?'B-ORG',?'I-ORG',?'B-PER',?'B-LOC',?'I-PER',?'I-LOC',?'sO']
????id2label?=?dict(zip(range(len(lables)),?lables))
????callbacks?=?[F1Metrics(id2label)]
????lstm_model.fit(train_x,?train_y,?validation_data=(test_x,?test_y),?epochs=epochs,
???????????????????batch_size=batch_size,?verbose=1,?callbacks=callbacks)
此時輸出結(jié)果為:
12598/12598?[==============================]?-?26s?2ms/step?-?loss:?0.0089?-?acc:?0.9978?-?val_loss:?0.2145?-?val_acc:?0.9560
?-?f1:?95.40
???????????precision????recall??f1-score???support
?????MISC?????0.9707????0.9833????0.9769?????15844
??????PER?????0.9080????0.8194????0.8614??????1157
??????LOC?????0.7517????0.8095????0.7795???????677
??????ORG?????0.8290????0.7289????0.7757???????745
???????sO?????0.7757????0.8300????0.8019???????100
micro?avg?????0.9524????0.9556????0.9540?????18523
macro?avg?????0.9520????0.9556????0.9535?????18523
這就是seqeval的強(qiáng)大之處。
關(guān)于seqeval在Keras的使用,有不清楚的地方可以參考該項目的Github網(wǎng)址:https://github.com/chakki-works/seqeval 。
推薦閱讀


點擊下方閱讀原文加入社區(qū)會員
點贊鼓勵一下

