91香蕉国产视频,怒操香蕉,香蕉社区在线观看,国产视频成人,天堂中文在线观看,青娱视频亚洲免费,羽月希亚洲一区二区三区,黄色成人免费视频

bertorch ( https://github.com/zejunwang1/bertorch ) 是一個(gè)基于 pytorch 進(jìn)行 bert 實(shí)現(xiàn)和下游任務(wù)微調(diào)的工具，支持常用的自然語(yǔ)言處理任務(wù)，包括文本分類、文本匹配、語(yǔ)義理解和序列標(biāo)注等。

1. 依賴環(huán)境
2. 文本分類
3. 文本匹配
4. 語(yǔ)義理解

4.1 SimCSE
4.2 In-Batch Negatives

5. 序列標(biāo)注

1. 依賴環(huán)境

Python >= 3.6
torch >= 1.1
argparse
json
loguru
numpy
packaging
re

2. 文本分類

本項(xiàng)目展示了以 BERT 為代表的預(yù)訓(xùn)練模型如何 Finetune 完成文本分類任務(wù)。我們以中文情感分類公開(kāi)數(shù)據(jù)集 ChnSentiCorp 為例，運(yùn)行如下的命令，基于 DistributedDataParallel 進(jìn)行單機(jī)多卡分布式訓(xùn)練，在訓(xùn)練集 (train.tsv) 上進(jìn)行模型訓(xùn)練，并在驗(yàn)證集 (dev.tsv) 上進(jìn)行評(píng)估：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_classifier.py?--train_data_file?./data/ChnSentiCorp/train.tsv?--dev_data_file?./data/ChnSentiCorp/dev.tsv?--label_file?./data/ChnSentiCorp/labels.txt?--save_best_model?--epochs?3?--batch_size?32

可支持的配置參數(shù)：

usage:?run_classifier.py?[-h]?[--local_rank?LOCAL_RANK]
?????????????????????????[--pretrained_model_name_or_path?PRETRAINED_MODEL_NAME_OR_PATH]
?????????????????????????[--init_from_ckpt?INIT_FROM_CKPT]?--train_data_file
?????????????????????????TRAIN_DATA_FILE?[--dev_data_file?DEV_DATA_FILE]
?????????????????????????--label_file?LABEL_FILE?[--batch_size?BATCH_SIZE]
?????????????????????????[--scheduler?{linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
?????????????????????????[--learning_rate?LEARNING_RATE]
?????????????????????????[--warmup_proportion?WARMUP_PROPORTION]?[--seed?SEED]
?????????????????????????[--save_steps?SAVE_STEPS]
?????????????????????????[--logging_steps?LOGGING_STEPS]
?????????????????????????[--weight_decay?WEIGHT_DECAY]?[--epochs?EPOCHS]
?????????????????????????[--max_seq_length?MAX_SEQ_LENGTH]
?????????????????????????[--saved_dir?SAVED_DIR]
?????????????????????????[--max_grad_norm?MAX_GRAD_NORM]?[--save_best_model]
?????????????????????????[--is_text_pair]

local_rank: 可選，分布式訓(xùn)練的節(jié)點(diǎn)編號(hào)，默認(rèn)為 -1。
pretrained_model_name_or_path: 可選，huggingface 中的預(yù)訓(xùn)練模型名稱或路徑，默認(rèn)為 bert-base-chinese。
train_data_file: 必選，訓(xùn)練集數(shù)據(jù)文件路徑。
dev_data_file: 可選，驗(yàn)證集數(shù)據(jù)文件路徑，默認(rèn)為 None。
label_file: 必選，類別標(biāo)簽文件路徑。
batch_size: 可選，批處理大小，請(qǐng)結(jié)合顯存情況進(jìn)行調(diào)整，若出現(xiàn)顯存不足，請(qǐng)適當(dāng)調(diào)低這一參數(shù)。默認(rèn)為 32。
init_from_ckpt: 可選，要加載的模型參數(shù)路徑，熱啟動(dòng)模型訓(xùn)練。默認(rèn)為None。
scheduler: 可選，優(yōu)化器學(xué)習(xí)率變化策略，默認(rèn)為 linear。
learning_rate: 可選，優(yōu)化器的最大學(xué)習(xí)率，默認(rèn)為 5e-5。
warmup_proportion: 可選，學(xué)習(xí)率 warmup 策略的比例，如果為 0.1，則學(xué)習(xí)率會(huì)在前 10% 訓(xùn)練 step 的過(guò)程中從 0 慢慢增長(zhǎng)到 learning_rate，而后再緩慢衰減。默認(rèn)為 0。
weight_decay: 可選，控制正則項(xiàng)力度的參數(shù)，用于防止過(guò)擬合，默認(rèn)為 0.0。
seed: 可選，隨機(jī)種子，默認(rèn)為1000。
logging_steps: 可選，日志打印的間隔 steps，默認(rèn)為 20。
save_steps: 可選，保存模型參數(shù)的間隔 steps，默認(rèn)為 100。
epochs: 可選，訓(xùn)練輪次，默認(rèn)為 3。
max_seq_length: 可選，輸入到預(yù)訓(xùn)練模型中的最大序列長(zhǎng)度，最大不能超過(guò) 512，默認(rèn)為 128。
saved_dir: 可選，保存訓(xùn)練模型的文件夾路徑，默認(rèn)保存在當(dāng)前目錄的 checkpoint 文件夾下。
max_grad_norm: 可選，訓(xùn)練過(guò)程中梯度裁剪的 max_norm 參數(shù)，默認(rèn)為 1.0。
save_best_model: 可選，是否在最佳驗(yàn)證集指標(biāo)上保存模型，當(dāng)訓(xùn)練命令中加入
--save_best_model 時(shí)，save_best_model 為 True，否則為 False。
is_text_pair: 可選，是否進(jìn)行文本對(duì)分類，當(dāng)訓(xùn)練命令中加入 --is_text_pair 時(shí)，進(jìn)行文本對(duì)的分類，否則進(jìn)行普通文本分類。

模型訓(xùn)練的中間日志如下：

2022-05-25?07:22:29.403?|?INFO?????|?__main__:train:301?-?global?step:?20,?epoch:?1,?batch:?20,?loss:?0.23227,?accuracy:?0.87500,?speed:?2.12?step/s
2022-05-25?07:22:39.131?|?INFO?????|?__main__:train:301?-?global?step:?40,?epoch:?1,?batch:?40,?loss:?0.30054,?accuracy:?0.87500,?speed:?2.06?step/s
2022-05-25?07:22:49.010?|?INFO?????|?__main__:train:301?-?global?step:?60,?epoch:?1,?batch:?60,?loss:?0.23514,?accuracy:?0.93750,?speed:?2.02?step/s
2022-05-25?07:22:58.909?|?INFO?????|?__main__:train:301?-?global?step:?80,?epoch:?1,?batch:?80,?loss:?0.12026,?accuracy:?0.96875,?speed:?2.02?step/s
2022-05-25?07:23:08.804?|?INFO?????|?__main__:train:301?-?global?step:?100,?epoch:?1,?batch:?100,?loss:?0.21955,?accuracy:?0.90625,?speed:?2.02?step/s
2022-05-25?07:23:13.534?|?INFO?????|?__main__:train:307?-?eval?loss:?0.22564,?accuracy:?0.91750
2022-05-25?07:23:25.222?|?INFO?????|?__main__:train:301?-?global?step:?120,?epoch:?1,?batch:?120,?loss:?0.32157,?accuracy:?0.90625,?speed:?2.03?step/s
2022-05-25?07:23:35.104?|?INFO?????|?__main__:train:301?-?global?step:?140,?epoch:?1,?batch:?140,?loss:?0.20107,?accuracy:?0.87500,?speed:?2.02?step/s
2022-05-25?07:23:44.978?|?INFO?????|?__main__:train:301?-?global?step:?160,?epoch:?2,?batch:?10,?loss:?0.08750,?accuracy:?0.96875,?speed:?2.03?step/s
2022-05-25?07:23:54.869?|?INFO?????|?__main__:train:301?-?global?step:?180,?epoch:?2,?batch:?30,?loss:?0.08308,?accuracy:?1.00000,?speed:?2.02?step/s
2022-05-25?07:24:04.754?|?INFO?????|?__main__:train:301?-?global?step:?200,?epoch:?2,?batch:?50,?loss:?0.10256,?accuracy:?0.93750,?speed:?2.02?step/s
2022-05-25?07:24:09.480?|?INFO?????|?__main__:train:307?-?eval?loss:?0.22497,?accuracy:?0.93083
2022-05-25?07:24:21.020?|?INFO?????|?__main__:train:301?-?global?step:?220,?epoch:?2,?batch:?70,?loss:?0.23989,?accuracy:?0.93750,?speed:?2.03?step/s
2022-05-25?07:24:30.919?|?INFO?????|?__main__:train:301?-?global?step:?240,?epoch:?2,?batch:?90,?loss:?0.00897,?accuracy:?1.00000,?speed:?2.02?step/s
2022-05-25?07:24:40.777?|?INFO?????|?__main__:train:301?-?global?step:?260,?epoch:?2,?batch:?110,?loss:?0.13605,?accuracy:?0.93750,?speed:?2.03?step/s
2022-05-25?07:24:50.640?|?INFO?????|?__main__:train:301?-?global?step:?280,?epoch:?2,?batch:?130,?loss:?0.14508,?accuracy:?0.93750,?speed:?2.03?step/s
2022-05-25?07:25:00.529?|?INFO?????|?__main__:train:301?-?global?step:?300,?epoch:?2,?batch:?150,?loss:?0.04770,?accuracy:?0.96875,?speed:?2.02?step/s
2022-05-25?07:25:05.256?|?INFO?????|?__main__:train:307?-?eval?loss:?0.23039,?accuracy:?0.93500
2022-05-25?07:25:16.818?|?INFO?????|?__main__:train:301?-?global?step:?320,?epoch:?3,?batch:?20,?loss:?0.04312,?accuracy:?0.96875,?speed:?2.04?step/s
2022-05-25?07:25:26.700?|?INFO?????|?__main__:train:301?-?global?step:?340,?epoch:?3,?batch:?40,?loss:?0.05103,?accuracy:?0.96875,?speed:?2.02?step/s
2022-05-25?07:25:36.588?|?INFO?????|?__main__:train:301?-?global?step:?360,?epoch:?3,?batch:?60,?loss:?0.12114,?accuracy:?0.87500,?speed:?2.02?step/s
2022-05-25?07:25:46.443?|?INFO?????|?__main__:train:301?-?global?step:?380,?epoch:?3,?batch:?80,?loss:?0.01080,?accuracy:?1.00000,?speed:?2.03?step/s
2022-05-25?07:25:56.228?|?INFO?????|?__main__:train:301?-?global?step:?400,?epoch:?3,?batch:?100,?loss:?0.14839,?accuracy:?0.96875,?speed:?2.04?step/s
2022-05-25?07:26:00.953?|?INFO?????|?__main__:train:307?-?eval?loss:?0.22589,?accuracy:?0.94083
2022-05-25?07:26:12.483?|?INFO?????|?__main__:train:301?-?global?step:?420,?epoch:?3,?batch:?120,?loss:?0.14986,?accuracy:?0.96875,?speed:?2.05?step/s
2022-05-25?07:26:22.289?|?INFO?????|?__main__:train:301?-?global?step:?440,?epoch:?3,?batch:?140,?loss:?0.00687,?accuracy:?1.00000,?speed:?2.04?step/s

當(dāng)需要進(jìn)行文本對(duì)分類時(shí)，僅需設(shè)置 is_text_pair 為 True。以 CLUEbenchmark 中的 AFQMC 螞蟻金融語(yǔ)義相似度數(shù)據(jù)集為例，可以運(yùn)行如下的命令進(jìn)行訓(xùn)練：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_classifier.py?--train_data_file?./data/AFQMC/train.txt?--dev_data_file?./data/AFQMC/dev.txt?--label_file?./data/AFQMC/labels.txt?--is_text_pair?--save_best_model?--epochs?3?--batch_size?32

在不同數(shù)據(jù)集上進(jìn)行訓(xùn)練，驗(yàn)證集上的效果如下：

Task	ChnSentiCorp	AFQMC	TNEWS
dev-acc	0.94083	0.74305	0.56990

TNEWS 為 CLUEbenchmark 中的今日頭條新聞分類數(shù)據(jù)集。

CLUEbenchmark 數(shù)據(jù)集鏈接：https://github.com/CLUEbenchmark/CLUE

3. 文本匹配

本項(xiàng)目展示了如何基于 Sentence-BERT 結(jié)構(gòu) Finetune 完成中文文本匹配任務(wù)。Sentence BERT 采用了雙塔 (Siamese) 的網(wǎng)絡(luò)結(jié)構(gòu)。Query 和 Title 分別輸入到兩個(gè)共享參數(shù)的 bert encoder 中，得到各自的 token embedding 特征。然后對(duì) token embedding 進(jìn)行 pooling (論文中使用 mean pooling 操作)，輸出分別記作 u 和 v。最后將三個(gè)向量 (u,v,|u-v|) 拼接起來(lái)輸入到線性分類器中進(jìn)行分類。網(wǎng)絡(luò)結(jié)構(gòu)如下圖所示：

更多關(guān)于 Sentence-BERT 的信息可以參考論文：https://arxiv.org/abs/1908.10084

我們以中文文本匹配數(shù)據(jù)集 LCQMC 為例，運(yùn)行下面的命令，基于 DistributedDataParallel 進(jìn)行單機(jī)多卡分布式訓(xùn)練，在訓(xùn)練集上進(jìn)行模型訓(xùn)練，在驗(yàn)證集上進(jìn)行評(píng)估：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_sentencebert.py?--train_data_file?./data/LCQMC/train.txt?--dev_data_file?./data/LCQMC/dev.txt?--save_best_model?--epochs?3?--batch_size?32

可支持的配置參數(shù)：

usage:?run_sentencebert.py?[-h]?[--local_rank?LOCAL_RANK]
???????????????????????????[--pretrained_model_name_or_path?PRETRAINED_MODEL_NAME_OR_PATH]
???????????????????????????[--init_from_ckpt?INIT_FROM_CKPT]?--train_data_file
???????????????????????????TRAIN_DATA_FILE?[--dev_data_file?DEV_DATA_FILE]
???????????????????????????[--label_file?LABEL_FILE]?[--batch_size?BATCH_SIZE]
???????????????????????????[--scheduler?{linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
???????????????????????????[--learning_rate?LEARNING_RATE]
???????????????????????????[--warmup_proportion?WARMUP_PROPORTION]
???????????????????????????[--seed?SEED]?[--save_steps?SAVE_STEPS]
???????????????????????????[--logging_steps?LOGGING_STEPS]
???????????????????????????[--weight_decay?WEIGHT_DECAY]?[--epochs?EPOCHS]
???????????????????????????[--max_seq_length?MAX_SEQ_LENGTH]
???????????????????????????[--saved_dir?SAVED_DIR]
???????????????????????????[--max_grad_norm?MAX_GRAD_NORM]?[--save_best_model]
???????????????????????????[--is_nli]?[--pooling_mode?{linear,cls,mean}]
???????????????????????????[--concat_multiply]
???????????????????????????[--output_emb_size?OUTPUT_EMB_SIZE]

其中大部分參數(shù)與文本分類中介紹的相同，如下為特有的參數(shù)：

is_nli: 可選，當(dāng)訓(xùn)練命令中加入 --is_nli 時(shí)，使用 NLI 自然語(yǔ)言推斷數(shù)據(jù)集進(jìn)行模型訓(xùn)練。
pooling_mode: 可選，當(dāng)為 linear 時(shí)，使用 cls 向量經(jīng)過(guò) linear pooler 后的輸出作為 encoder 編碼的句子向量；當(dāng)為 cls 時(shí)，使用 cls 向量作為 encoder 編碼的句子向量；當(dāng)為 mean 時(shí)，使用所有 token 向量的平均值作為 encoder 編碼的句子向量。默認(rèn)為 linear。
concat_multiply: 可選，當(dāng)訓(xùn)練命令中加入 --concat_multiply 時(shí)，使用 (u, v, |u-v|, u*v) 作為分類器的輸入特征；否則使用 (u, v, |u-v|) 作為分類器的輸入特征。
output_emb_size: 可選，encoder 輸出的句子向量維度，當(dāng)為 None 時(shí)，輸出句子向量的維度為 encoder 的 hidden_size。默認(rèn)為 None。

模型訓(xùn)練的部分中間日志如下：

......
2022-05-24?17:07:26.672?|?INFO?????|?__main__:train:308?-?global?step:?9620,?epoch:?3,?batch:?2158,?loss:?0.16183,?accuracy:?0.90625,?speed:?3.38?step/s
2022-05-24?17:07:32.407?|?INFO?????|?__main__:train:308?-?global?step:?9640,?epoch:?3,?batch:?2178,?loss:?0.09866,?accuracy:?0.96875,?speed:?3.49?step/s
2022-05-24?17:07:38.177?|?INFO?????|?__main__:train:308?-?global?step:?9660,?epoch:?3,?batch:?2198,?loss:?0.38715,?accuracy:?0.90625,?speed:?3.47?step/s
2022-05-24?17:07:43.796?|?INFO?????|?__main__:train:308?-?global?step:?9680,?epoch:?3,?batch:?2218,?loss:?0.12515,?accuracy:?0.93750,?speed:?3.56?step/s
2022-05-24?17:07:49.740?|?INFO?????|?__main__:train:308?-?global?step:?9700,?epoch:?3,?batch:?2238,?loss:?0.03231,?accuracy:?1.00000,?speed:?3.37?step/s
2022-05-24?17:08:04.752?|?INFO?????|?__main__:train:314?-?eval?loss:?0.38621,?accuracy:?0.86549
2022-05-24?17:08:12.245?|?INFO?????|?__main__:train:308?-?global?step:?9720,?epoch:?3,?batch:?2258,?loss:?0.08337,?accuracy:?0.96875,?speed:?3.45?step/s
2022-05-24?17:08:18.112?|?INFO?????|?__main__:train:308?-?global?step:?9740,?epoch:?3,?batch:?2278,?loss:?0.15085,?accuracy:?0.93750,?speed:?3.41?step/s
2022-05-24?17:08:23.895?|?INFO?????|?__main__:train:308?-?global?step:?9760,?epoch:?3,?batch:?2298,?loss:?0.11466,?accuracy:?0.93750,?speed:?3.46?step/s
2022-05-24?17:08:29.703?|?INFO?????|?__main__:train:308?-?global?step:?9780,?epoch:?3,?batch:?2318,?loss:?0.04269,?accuracy:?1.00000,?speed:?3.44?step/s
2022-05-24?17:08:35.658?|?INFO?????|?__main__:train:308?-?global?step:?9800,?epoch:?3,?batch:?2338,?loss:?0.28312,?accuracy:?0.90625,?speed:?3.36?step/s
2022-05-24?17:08:50.674?|?INFO?????|?__main__:train:314?-?eval?loss:?0.39262,?accuracy:?0.86424
2022-05-24?17:08:56.609?|?INFO?????|?__main__:train:308?-?global?step:?9820,?epoch:?3,?batch:?2358,?loss:?0.13456,?accuracy:?0.96875,?speed:?3.37?step/s
2022-05-24?17:09:02.259?|?INFO?????|?__main__:train:308?-?global?step:?9840,?epoch:?3,?batch:?2378,?loss:?0.06361,?accuracy:?1.00000,?speed:?3.54?step/s
2022-05-24?17:09:08.120?|?INFO?????|?__main__:train:308?-?global?step:?9860,?epoch:?3,?batch:?2398,?loss:?0.09087,?accuracy:?0.96875,?speed:?3.41?step/s
2022-05-24?17:09:13.834?|?INFO?????|?__main__:train:308?-?global?step:?9880,?epoch:?3,?batch:?2418,?loss:?0.19537,?accuracy:?0.90625,?speed:?3.50?step/s
2022-05-24?17:09:19.531?|?INFO?????|?__main__:train:308?-?global?step:?9900,?epoch:?3,?batch:?2438,?loss:?0.05254,?accuracy:?1.00000,?speed:?3.51?step/s
2022-05-24?17:09:34.531?|?INFO?????|?__main__:train:314?-?eval?loss:?0.39561,?accuracy:?0.86560
2022-05-24?17:09:42.084?|?INFO?????|?__main__:train:308?-?global?step:?9920,?epoch:?3,?batch:?2458,?loss:?0.05342,?accuracy:?1.00000,?speed:?3.41?step/s
2022-05-24?17:09:47.781?|?INFO?????|?__main__:train:308?-?global?step:?9940,?epoch:?3,?batch:?2478,?loss:?0.22660,?accuracy:?0.87500,?speed:?3.51?step/s
2022-05-24?17:09:53.496?|?INFO?????|?__main__:train:308?-?global?step:?9960,?epoch:?3,?batch:?2498,?loss:?0.14745,?accuracy:?0.93750,?speed:?3.50?step/s
2022-05-24?17:09:59.350?|?INFO?????|?__main__:train:308?-?global?step:?9980,?epoch:?3,?batch:?2518,?loss:?0.06218,?accuracy:?0.96875,?speed:?3.42?step/s
2022-05-24?17:10:05.157?|?INFO?????|?__main__:train:308?-?global?step:?10000,?epoch:?3,?batch:?2538,?loss:?0.15225,?accuracy:?0.96875,?speed:?3.44?step/s
2022-05-24?17:10:20.159?|?INFO?????|?__main__:train:314?-?eval?loss:?0.39152,?accuracy:?0.86730
......

當(dāng)使用 NLI 數(shù)據(jù)進(jìn)行訓(xùn)練時(shí)，需要加入 --is_nli 選項(xiàng)和 --label_file LABEL_FILE，訓(xùn)練命令如下：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_sentencebert.py?--train_data_file?./data/CMNLI/train.txt?--dev_data_file?./data/CMNLI/dev.txt?--label_file?./data/CMNLI/labels.txt?--is_nli?--save_best_model?--epochs?3?--batch_size?32

在不同數(shù)據(jù)集上進(jìn)行訓(xùn)練，驗(yàn)證集上的效果如下：

Task	LCQMC	Chinese-MNLI	Chinese-SNLI
dev-acc	0.86730	0.71105	0.80567

Chinese-MNLI 和 Chinese-SNLI 鏈接：https://github.com/zejunwang1/CSTS

4. 語(yǔ)義理解

4.1 SimCSE

SimCSE 模型適合缺乏監(jiān)督數(shù)據(jù)，但是又有大量無(wú)監(jiān)督數(shù)據(jù)的匹配和檢索場(chǎng)景。本項(xiàng)目實(shí)現(xiàn)了 SimCSE 無(wú)監(jiān)督方法，并在中文維基百科句子數(shù)據(jù)上進(jìn)行句向量表示模型的訓(xùn)練。

更多關(guān)于 SimCSE 的信息可以參考論文：https://arxiv.org/abs/2104.08821

從中文維基百科中抽取 15 萬(wàn)條句子數(shù)據(jù)，保存于 data/zhwiki/ 文件夾下的 wiki_sents.txt 文件中，運(yùn)行下面的命令，基于騰訊 uer 開(kāi)源的預(yù)訓(xùn)練語(yǔ)言模型 uer/chinese_roberta_L-6_H-128 (https://huggingface.co/uer/chinese_roberta_L-6_H-128) ，使用 SimCSE 無(wú)監(jiān)督方法進(jìn)行訓(xùn)練，并在 Chinese-STS-B 驗(yàn)證集 ( https://github.com/zejunwang1/CSTS ) 上進(jìn)行評(píng)估：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_simcse.py?--pretrained_model_name_or_path?uer/chinese_roberta_L-6_H-128?--train_data_file?./data/zhwiki/wiki_sents.txt?--dev_data_file?./data/STS-B/sts-b-dev.txt?--learning_rate?5e-5?--epochs?1?--dropout?0.1?--margin?0.2?--scale?20?--batch_size?32

可支持的配置參數(shù)：

usage:?run_simcse.py?[-h]?[--local_rank?LOCAL_RANK]
?????????????????????[--pretrained_model_name_or_path?PRETRAINED_MODEL_NAME_OR_PATH]
?????????????????????[--init_from_ckpt?INIT_FROM_CKPT]?--train_data_file
?????????????????????TRAIN_DATA_FILE?[--dev_data_file?DEV_DATA_FILE]
?????????????????????[--batch_size?BATCH_SIZE]
?????????????????????[--scheduler?{linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
?????????????????????[--learning_rate?LEARNING_RATE]
?????????????????????[--warmup_proportion?WARMUP_PROPORTION]?[--seed?SEED]
?????????????????????[--save_steps?SAVE_STEPS]?[--logging_steps?LOGGING_STEPS]
?????????????????????[--weight_decay?WEIGHT_DECAY]?[--epochs?EPOCHS]
?????????????????????[--max_seq_length?MAX_SEQ_LENGTH]?[--saved_dir?SAVED_DIR]
?????????????????????[--max_grad_norm?MAX_GRAD_NORM]?[--save_best_model]
?????????????????????[--margin?MARGIN]?[--scale?SCALE]?[--dropout?DROPOUT]
?????????????????????[--pooling_mode?{linear,cls,mean}]
?????????????????????[--output_emb_size?OUTPUT_EMB_SIZE]

其中大部分參數(shù)與文本分類中介紹的相同，如下為特有的參數(shù)：

margin: 可選，正樣本相似度與負(fù)樣本之間的目標(biāo) Gap，默認(rèn)為 0.2。
dropout: 可選，SimCSE 網(wǎng)絡(luò)中 encoder 部分使用的 dropout 取值，默認(rèn)為 0.1。
scale: 可選，在計(jì)算交叉熵?fù)p失之前，對(duì)余弦相似度進(jìn)行縮放的因子，默認(rèn)為 20。
pooling_mode: 可選，當(dāng)為 linear 時(shí)，使用 cls 向量經(jīng)過(guò) linear pooler 后的輸出作為 encoder 編碼的句子向量；當(dāng)為 cls 時(shí)，使用 cls 向量作為 encoder 編碼的句子向量；當(dāng)為 mean 時(shí)，使用所有 token 向量的平均值作為 encoder 編碼的句子向量。默認(rèn)為 linear。
output_emb_size: 可選，encoder 輸出的句子向量維度，當(dāng)為 None 時(shí)，輸出句子向量的維度為 encoder 的 hidden_size。默認(rèn)為 None。

模型訓(xùn)練的部分中間日志如下：

2022-05-27?09:14:58.471?|?INFO?????|?__main__:train:315?-?global?step:?20,?epoch:?1,?batch:?20,?loss:?1.04241,?speed:?8.45?step/s
2022-05-27?09:15:01.063?|?INFO?????|?__main__:train:315?-?global?step:?40,?epoch:?1,?batch:?40,?loss:?0.15792,?speed:?7.72?step/s
2022-05-27?09:15:03.700?|?INFO?????|?__main__:train:315?-?global?step:?60,?epoch:?1,?batch:?60,?loss:?0.18357,?speed:?7.58?step/s
2022-05-27?09:15:06.365?|?INFO?????|?__main__:train:315?-?global?step:?80,?epoch:?1,?batch:?80,?loss:?0.13284,?speed:?7.51?step/s
2022-05-27?09:15:09.000?|?INFO?????|?__main__:train:315?-?global?step:?100,?epoch:?1,?batch:?100,?loss:?0.14146,?speed:?7.59?step/s
2022-05-27?09:15:09.847?|?INFO?????|?__main__:train:321?-?spearman?corr:?0.6048,?pearson?corr:?0.5870
2022-05-27?09:15:12.507?|?INFO?????|?__main__:train:315?-?global?step:?120,?epoch:?1,?batch:?120,?loss:?0.03073,?speed:?7.74?step/s
2022-05-27?09:15:15.110?|?INFO?????|?__main__:train:315?-?global?step:?140,?epoch:?1,?batch:?140,?loss:?0.09425,?speed:?7.69?step/s
2022-05-27?09:15:17.749?|?INFO?????|?__main__:train:315?-?global?step:?160,?epoch:?1,?batch:?160,?loss:?0.08629,?speed:?7.58?step/s
2022-05-27?09:15:20.386?|?INFO?????|?__main__:train:315?-?global?step:?180,?epoch:?1,?batch:?180,?loss:?0.03206,?speed:?7.59?step/s
2022-05-27?09:15:23.052?|?INFO?????|?__main__:train:315?-?global?step:?200,?epoch:?1,?batch:?200,?loss:?0.11463,?speed:?7.50?step/s
2022-05-27?09:15:24.023?|?INFO?????|?__main__:train:321?-?spearman?corr:?0.5954,?pearson?corr:?0.5807
......

隱藏層數(shù) num_hidden_layers=6，維度 hidden_size=128 的 SimCSE 句向量預(yù)訓(xùn)練模型 simcse_tiny_chinese_wiki 可以從如下鏈接獲取：

model_name	link
WangZeJun/simcse-tiny-chinese-wiki	https://huggingface.co/WangZeJun/simcse-tiny-chinese-wiki

4.2 In-Batch Negatives

從哈工大 LCQMC 數(shù)據(jù)集、谷歌 PAWS-X 數(shù)據(jù)集、北大文本復(fù)述 PKU-Paraphrase-Bank 數(shù)據(jù)集 (https://github.com/zejunwang1/CSTS) 中抽取出所有語(yǔ)義相似的文本 Pair 作為訓(xùn)練集，保存于：data/batchneg/paraphrase_lcqmc_semantic_pairs.txt

運(yùn)行下面的命令，基于騰訊 uer 開(kāi)源的預(yù)訓(xùn)練語(yǔ)言模型 uer/chinese_roberta_L-6_H-128，采用 In-batch negatives 策略，在 GPU 0,1,2,3 四張卡上訓(xùn)練句向量表示模型，并在 Chinese-STS-B 驗(yàn)證集上進(jìn)行評(píng)估：

CUDA_VISIBLE_DEVICES=0,1,2,3?python?-m?torch.distributed.launch?--nproc_per_node=4?run_batchneg.py?--pretrained_model_name_or_path?uer/chinese_roberta_L-6_H-128?--train_data_file?./data/batchneg/paraphrase_lcqmc_semantic_pairs.txt?--dev_data_file?./data/STS-B/sts-b-dev.txt?--learning_rate?5e-5?--epochs?3?--margin?0.2?--scale?20?--batch_size?64?--mean_loss

可支持的配置參數(shù)：

usage:?run_batchneg.py?[-h]?[--local_rank?LOCAL_RANK]
???????????????????????[--pretrained_model_name_or_path?PRETRAINED_MODEL_NAME_OR_PATH]
???????????????????????[--init_from_ckpt?INIT_FROM_CKPT]?--train_data_file
???????????????????????TRAIN_DATA_FILE?[--dev_data_file?DEV_DATA_FILE]
???????????????????????[--batch_size?BATCH_SIZE]
???????????????????????[--scheduler?{linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
???????????????????????[--learning_rate?LEARNING_RATE]
???????????????????????[--warmup_proportion?WARMUP_PROPORTION]?[--seed?SEED]
???????????????????????[--save_steps?SAVE_STEPS]
???????????????????????[--logging_steps?LOGGING_STEPS]
???????????????????????[--weight_decay?WEIGHT_DECAY]?[--epochs?EPOCHS]
???????????????????????[--max_seq_length?MAX_SEQ_LENGTH]
???????????????????????[--saved_dir?SAVED_DIR]?[--max_grad_norm?MAX_GRAD_NORM]
???????????????????????[--save_best_model]?[--margin?MARGIN]?[--scale?SCALE]
???????????????????????[--pooling_mode?{linear,cls,mean}]
???????????????????????[--output_emb_size?OUTPUT_EMB_SIZE]?[--mean_loss]

各參數(shù)的介紹與 SimCSE 中相同，模型訓(xùn)練的部分中間日志如下：

......
2022-05-27?13:20:48.428?|?INFO?????|?__main__:train:318?-?global?step:?7220,?epoch:?3,?batch:?1888,?loss:?0.73655,?speed:?6.70?step/s
2022-05-27?13:20:51.454?|?INFO?????|?__main__:train:318?-?global?step:?7240,?epoch:?3,?batch:?1908,?loss:?0.70207,?speed:?6.61?step/s
2022-05-27?13:20:54.308?|?INFO?????|?__main__:train:318?-?global?step:?7260,?epoch:?3,?batch:?1928,?loss:?1.10231,?speed:?7.01?step/s
2022-05-27?13:20:57.107?|?INFO?????|?__main__:train:318?-?global?step:?7280,?epoch:?3,?batch:?1948,?loss:?0.94975,?speed:?7.15?step/s
2022-05-27?13:20:59.898?|?INFO?????|?__main__:train:318?-?global?step:?7300,?epoch:?3,?batch:?1968,?loss:?0.34252,?speed:?7.17?step/s
2022-05-27?13:21:00.322?|?INFO?????|?__main__:train:324?-?spearman?corr:?0.6950,?pearson?corr:?0.6801
2022-05-27?13:21:03.168?|?INFO?????|?__main__:train:318?-?global?step:?7320,?epoch:?3,?batch:?1988,?loss:?1.10022,?speed:?7.20?step/s
2022-05-27?13:21:05.929?|?INFO?????|?__main__:train:318?-?global?step:?7340,?epoch:?3,?batch:?2008,?loss:?1.00207,?speed:?7.25?step/s
2022-05-27?13:21:08.687?|?INFO?????|?__main__:train:318?-?global?step:?7360,?epoch:?3,?batch:?2028,?loss:?0.72985,?speed:?7.25?step/s
2022-05-27?13:21:11.372?|?INFO?????|?__main__:train:318?-?global?step:?7380,?epoch:?3,?batch:?2048,?loss:?0.88964,?speed:?7.45?step/s
2022-05-27?13:21:14.090?|?INFO?????|?__main__:train:318?-?global?step:?7400,?epoch:?3,?batch:?2068,?loss:?0.70836,?speed:?7.36?step/s
2022-05-27?13:21:14.520?|?INFO?????|?__main__:train:324?-?spearman?corr:?0.6922,?pearson?corr:?0.6764
......

以上面得到的模型為熱啟，在句子數(shù)據(jù)集 data/batchneg/domain_finetune.txt 上繼續(xù)進(jìn)行 In-batch negatives 訓(xùn)練：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_batchneg.py?--pretrained_model_name_or_path?uer/chinese_roberta_L-6_H-128?--init_from_ckpt?./checkpoint/pytorch_model.bin?--train_data_file?./data/batchneg/domain_finetune.txt?--dev_data_file?./data/STS-B/sts-b-dev.txt?--learning_rate?1e-5?--epochs?1?--margin?0.2?--scale?20?--batch_size?32?--mean_loss

可以得到隱藏層數(shù) num_hidden_layers=6，維度 hidden_size=128 的句向量預(yù)訓(xùn)練模型：

model_name	link
WangZeJun/batchneg-tiny-chinese	https://huggingface.co/WangZeJun/batchneg-tiny-chinese

5. 序列標(biāo)注

本項(xiàng)目展示了以 BERT 為代表的預(yù)訓(xùn)練模型如何 Finetune 完成序列標(biāo)注任務(wù)。以中文命名實(shí)體識(shí)別任務(wù)為例，分別在 msra、ontonote4、resume 和 weibo 四個(gè)數(shù)據(jù)集上進(jìn)行訓(xùn)練和測(cè)試。每個(gè)數(shù)據(jù)集的訓(xùn)練集和驗(yàn)證集均被預(yù)處理為如下的格式，每一行為文本和標(biāo)簽組成的 json 字符串。

{"text":?["我",?"們",?"的",?"藏",?"品",?"中",?"有",?"幾",?"十",?"冊(cè)",?"為",?"北",?"京",?"圖",?"書(shū)",?"館",?"等",?"國(guó)",?"家",?"級(jí)",?"藏",?"館",?"所",?"未",?"藏",?"。"],?"label":?["O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"B-NS",?"I-NS",?"I-NS",?"I-NS",?"I-NS",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O"]}
{"text":?["由",?"于",?"這",?"一",?"時(shí)",?"期",?"戰(zhàn)",?"爭(zhēng)",?"頻",?"繁",?"，",?"條",?"件",?"艱",?"苦",?"，",?"又",?"遭",?"國(guó)",?"民",?"黨",?"毀",?"禁",?"，",?"傳",?"世",?"量",?"稀",?"少",?"，",?"購(gòu)",?"藏",?"不",?"易",?"。"],?"label":?["O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"B-NT",?"I-NT",?"I-NT",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O",?"O"]}

運(yùn)行下面的命令，在 msra 數(shù)據(jù)集上使用 BERT+Linear 結(jié)構(gòu)進(jìn)行單機(jī)多卡分布式訓(xùn)練，并在驗(yàn)證集上進(jìn)行評(píng)估：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_ner.py?--train_data_file?./data/ner/msra/train.json?--dev_data_file?./data/ner/msra/dev.json?--label_file?./data/ner/msra/labels.txt?--tag?bios?--learning_rate?5e-5?--save_best_model?--batch_size?32

可支持的配置參數(shù)：

usage:?run_ner.py?[-h]?[--local_rank?LOCAL_RANK]
??????????????????[--pretrained_model_name_or_path?PRETRAINED_MODEL_NAME_OR_PATH]
??????????????????[--init_from_ckpt?INIT_FROM_CKPT]?--train_data_file
??????????????????TRAIN_DATA_FILE?[--dev_data_file?DEV_DATA_FILE]?--label_file
??????????????????LABEL_FILE?[--tag?{bios,bio}]?[--batch_size?BATCH_SIZE]
??????????????????[--scheduler?{linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
??????????????????[--learning_rate?LEARNING_RATE]
??????????????????[--crf_learning_rate?CRF_LEARNING_RATE]
??????????????????[--warmup_proportion?WARMUP_PROPORTION]?[--seed?SEED]
??????????????????[--save_steps?SAVE_STEPS]?[--logging_steps?LOGGING_STEPS]
??????????????????[--weight_decay?WEIGHT_DECAY]?[--epochs?EPOCHS]
??????????????????[--max_seq_length?MAX_SEQ_LENGTH]?[--saved_dir?SAVED_DIR]
??????????????????[--max_grad_norm?MAX_GRAD_NORM]?[--save_best_model]
??????????????????[--use_crf]

大部分參數(shù)與文本分類中介紹的相同，如下為特有的參數(shù)：

tag: 可選，實(shí)體標(biāo)記方法，支持 bios 和 bio 的標(biāo)注方法，默認(rèn)為 bios。
use_crf: 可選，是否使用 CRF 結(jié)構(gòu)，當(dāng)訓(xùn)練命令中加入 --use_crf 時(shí)，使用 BERT+CRF 模型結(jié)構(gòu)；否則使用 BERT+Linear 模型結(jié)構(gòu)。
crf_learning_rate: 可選，CRF 模型參數(shù)的初始學(xué)習(xí)率，默認(rèn)為 5e-5。

模型訓(xùn)練的部分中間日志如下：

2022-05-27?15:56:59.043?|?INFO?????|?__main__:train:355?-?global?step:?20,?epoch:?1,?batch:?20,?loss:?0.20780,?speed:?2.10?step/s
2022-05-27?15:57:08.723?|?INFO?????|?__main__:train:355?-?global?step:?40,?epoch:?1,?batch:?40,?loss:?0.09440,?speed:?2.07?step/s
2022-05-27?15:57:18.001?|?INFO?????|?__main__:train:355?-?global?step:?60,?epoch:?1,?batch:?60,?loss:?0.05570,?speed:?2.16?step/s
2022-05-27?15:57:27.357?|?INFO?????|?__main__:train:355?-?global?step:?80,?epoch:?1,?batch:?80,?loss:?0.02468,?speed:?2.14?step/s
2022-05-27?15:57:36.994?|?INFO?????|?__main__:train:355?-?global?step:?100,?epoch:?1,?batch:?100,?loss:?0.05032,?speed:?2.08?step/s
2022-05-27?15:57:53.299?|?INFO?????|?__main__:train:362?-?eval?loss:?0.03203,?F1:?0.86481
2022-05-27?15:58:03.264?|?INFO?????|?__main__:train:355?-?global?step:?120,?epoch:?1,?batch:?120,?loss:?0.04150,?speed:?2.16?step/s
2022-05-27?15:58:12.712?|?INFO?????|?__main__:train:355?-?global?step:?140,?epoch:?1,?batch:?140,?loss:?0.04907,?speed:?2.12?step/s
2022-05-27?15:58:21.959?|?INFO?????|?__main__:train:355?-?global?step:?160,?epoch:?1,?batch:?160,?loss:?0.01224,?speed:?2.16?step/s
2022-05-27?15:58:31.039?|?INFO?????|?__main__:train:355?-?global?step:?180,?epoch:?1,?batch:?180,?loss:?0.01846,?speed:?2.20?step/s
2022-05-27?15:58:40.542?|?INFO?????|?__main__:train:355?-?global?step:?200,?epoch:?1,?batch:?200,?loss:?0.06604,?speed:?2.10?step/s
2022-05-27?15:58:56.831?|?INFO?????|?__main__:train:362?-?eval?loss:?0.02589,?F1:?0.89128
2022-05-27?15:59:07.813?|?INFO?????|?__main__:train:355?-?global?step:?220,?epoch:?1,?batch:?220,?loss:?0.07066,?speed:?2.15?step/s
2022-05-27?15:59:16.857?|?INFO?????|?__main__:train:355?-?global?step:?240,?epoch:?1,?batch:?240,?loss:?0.03061,?speed:?2.21?step/s
2022-05-27?15:59:26.240?|?INFO?????|?__main__:train:355?-?global?step:?260,?epoch:?1,?batch:?260,?loss:?0.01680,?speed:?2.13?step/s
2022-05-27?15:59:35.568?|?INFO?????|?__main__:train:355?-?global?step:?280,?epoch:?1,?batch:?280,?loss:?0.01245,?speed:?2.14?step/s
2022-05-27?15:59:44.684?|?INFO?????|?__main__:train:355?-?global?step:?300,?epoch:?1,?batch:?300,?loss:?0.02699,?speed:?2.19?step/s
2022-05-27?16:00:00.977?|?INFO?????|?__main__:train:362?-?eval?loss:?0.01928,?F1:?0.92157

當(dāng)使用 BERT+CRF 結(jié)構(gòu)進(jìn)行訓(xùn)練時(shí)，運(yùn)行下面的命令：

CUDA_VISIBLE_DEVICES=0,1?python?-m?torch.distributed.launch?--nproc_per_node=2?run_ner.py?--train_data_file?./data/ner/msra/train.json?--dev_data_file?./data/ner/msra/dev.json?--label_file?./data/ner/msra/labels.txt?--tag?bios?--learning_rate?5e-5?--save_best_model?--batch_size?32?--use_crf?--crf_learning_rate?1e-4

模型在不同驗(yàn)證集上的 F1 指標(biāo)：

模型	Msra	Resume	Ontonote	Weibo
BERT+Linear	0.94179	0.95643	0.80206	0.70588
BERT+CRF	0.94265	0.95818	0.80257	0.72215

其中 Msra、Resume 和 Ontonote 訓(xùn)練了 3 個(gè) epochs，Weibo 訓(xùn)練了 5 個(gè) epochs，Resume、Ontonote 和 Weibo 的 logging_steps 和 save_steps 均設(shè)置為 10，所有數(shù)據(jù)集的 BERT 參數(shù)初始學(xué)習(xí)率設(shè)置為 5e-5，CRF 參數(shù)初始學(xué)習(xí)率設(shè)置為 1e-4，batch_size 設(shè)置為 32。


往期精彩回顧




適合初學(xué)者入門(mén)人工智能的路線及資料下載
(圖文+視頻)機(jī)器學(xué)習(xí)入門(mén)系列下載
中國(guó)大學(xué)慕課《機(jī)器學(xué)習(xí)》（黃海廣主講）
機(jī)器學(xué)習(xí)及深度學(xué)習(xí)筆記等資料打印
《統(tǒng)計(jì)學(xué)習(xí)方法》的代碼復(fù)現(xiàn)專輯
機(jī)器學(xué)習(xí)交流qq群955171419，加入微信群請(qǐng)掃碼

【NLP】bertorch: 基于 pytorch 的 bert 實(shí)現(xiàn)和下游任務(wù)微調(diào)

1. 依賴環(huán)境

2. 文本分類

3. 文本匹配

4. 語(yǔ)義理解

4.1 SimCSE

4.2 In-Batch Negatives

5. 序列標(biāo)注