PatrickStar分布式深度學(xué)習(xí)訓(xùn)練工具
PatrickStar 是一款騰訊開(kāi)發(fā)的分布式深度學(xué)習(xí)訓(xùn)練工具,它的設(shè)計(jì)目標(biāo)是支持以 GPT、Bert 為代表的超大預(yù)訓(xùn)練模型訓(xùn)練。
用法
PatrickStar 基于 PyTorch,這使得遷移 pytorch 項(xiàng)目變得容易。以下是 PatrickStar 的示例:
from patrickstar.runtime import initialize_engine config = { "optimizer": { "type": "Adam", "params": { "lr": 0.001, "betas": (0.9, 0.999), "eps": 1e-6, "weight_decay": 0, "use_hybrid_adam": True, }, }, "fp16": { # loss scaler params "enabled": True, "loss_scale": 0, "initial_scale_power": 2 ** 3, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1, }, "default_chunk_size": 64 * 1024 * 1024, "release_after_init": True, "use_cpu_embedding": False, } def model_func(): # MyModel is a derived class for torch.nn.Module return MyModel(...) model, optimizer = initialize_engine(model_func=model_func, local_rank=0, config=config) ... for data in dataloader: optimizer.zero_grad() loss = model(data) model.backward(loss) optimizer.step()
使用與 DeepSpeed 配置 JSON 相同的config格式,主要包括優(yōu)化器、損失縮放器和一些 PatrickStar 特定配置的參數(shù)。
引用我們
@article{fang2021patrickstar,
title={PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management},
author={Fang, Jiarui and Yu, Yang and Zhu, Zilin and Li, Shenggui and You, Yang and Zhou, Jie},
journal={arXiv preprint arXiv:2108.05818},
year={2021}
}評(píng)論
圖片
表情
