国产精品久久久久久久久久乐 ,91三级片视频,亚洲色情在线视频,免费一级全黄少妇性色生活片,国产无遮挡又黄又爽在线观看,国产免费黄,操吊视频,91久久婷婷国产麻豆精品

點(diǎn)擊上方“程序員大白”，選擇“星標(biāo)”公眾號(hào)
重磅干貨，第一時(shí)間送達(dá)

作者：知乎 superjie13

https://www.zhihu.com/people/superjie13

本文對(duì)pytorch中的mixed precision進(jìn)行測(cè)試。主要包括兩部分，第一部分為mixed precision使用概述，第二部分為實(shí)際測(cè)試。參考torch官網(wǎng) Automatic Mixed Precision

Mixed precision使用概述

通常，automatic mixed precision training 需要使用 torch.cuda.amp.autocast 和 torch.cuda.amp.GradScaler 。

1. 1 首先實(shí)例化 torch.cuda.amp.autocast(enable=True) 作為上下文管理器或者裝飾器，從而使腳本使用混合精度運(yùn)行。注意：autocast 一般情況下只封裝前向傳播過程（包括loss的計(jì)算），并不包括反向傳播（反向傳播的數(shù)據(jù)類型與相應(yīng)前向傳播中的數(shù)據(jù)類型相同）。

1. 2 使用Gradient scaling 防止在反向傳播過程由于中梯度太小（float16無法表示小幅值的變化）從而下溢為0的情況。torch.cuda.amp.GradScaler() 可以自動(dòng)進(jìn)行g(shù)radient scaling。注意：由于GradScaler()對(duì)gradient進(jìn)行了scale，因此每個(gè)參數(shù)的gradient應(yīng)該在optimizer更新參數(shù)前unscaled，從而使學(xué)習(xí)率不受影響。


import torchvisionimport torchimport torch.cuda.ampimport gcimport time
# Timing utilitiesstart_time = None
def start_timer():    global start_time    gc.collect()    torch.cuda.empty_cache()    torch.cuda.reset_max_memory_allocated()    torch.cuda.synchronize()  # 同步后得出的時(shí)間才是實(shí)際運(yùn)行的時(shí)間    start_time = time.time()
def end_timer_and_print(local_msg):    torch.cuda.synchronize()    end_time = time.time()    print("\n" + local_msg)    print("Total execution time = {:.3f} sec".format(end_time - start_time))    print("Max memory used by tensors = {} bytes".format(torch.cuda.max_memory_allocated()))
num_batches = 50batch_size = 70epochs = 3
# 隨機(jī)創(chuàng)建訓(xùn)練數(shù)據(jù)data = [torch.randn(batch_size, 3, 224, 224, device="cuda") for _ in range(num_batches)]targets = [torch.randint(0, 1000, size=(batch_size, ), device='cuda') for _ in range(num_batches)]# 創(chuàng)建一個(gè)模型net = torchvision.models.resnext50_32x4d().cuda()# 定義損失函數(shù)loss_fn = torch.nn.CrossEntropyLoss().cuda()# 定義優(yōu)化器opt = torch.optim.SGD(net.parameters(), lr=0.001)
# 是否使用混合精度訓(xùn)練use_amp = True
# Constructs scaler once, at the beginning of the convergence run, using default args.# If your network fails to converge with default GradScaler args, please file an issue.# The same GradScaler instance should be used for the entire convergence run.# If you perform multiple convergence runs in the same script, each run should use# a dedicated fresh GradScaler instance.  GradScaler instances are lightweight.scaler = torch.cuda.amp.GradScaler(enabled=use_amp)
start_timer()for epoch in range(epochs):    for input, target in zip(data, targets):        with torch.cuda.amp.autocast(enabled=use_amp):            output = net(input)            loss = loss_fn(output, target)        # 放大loss  Calls backward() on scaled loss to create scaled gradients.        scaler.scale(loss).backward()
        # scaler.step() first unscales the gradients of the optimizer's assigned params.        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,        # otherwise, optimizer.step() is skipped.        scaler.step(opt)
        # Updates the scale for next iteration.        scaler.update()        opt.zero_grad(set_to_none=True) # set_to_none=True here can modestly improve performanceend_timer_and_print("Mixed precision:")

02
混合精度測(cè)試

測(cè)試環(huán)境：ubuntu18.04, pytorch 1.7.1, python3.7, RTX2080-8G

2.1 use_amp = False

batch size = 40

2.2 use_amp = True

batch size = 40

從實(shí)驗(yàn)2.1和2.2中，可以發(fā)現(xiàn)在batch size=40的情況下，不使用混合精度時(shí)，GPU內(nèi)存占用為7011MB，運(yùn)行時(shí)間為47.55 s。而使用混合精度時(shí)，GPU內(nèi)存占用為4997MB，運(yùn)行時(shí)間為27.006 s。在當(dāng)前運(yùn)行配置中，內(nèi)存占用節(jié)省了約28.73%，運(yùn)行時(shí)間節(jié)省了約43.21%。這也就意味著我們可以使用更大的batch size來提升運(yùn)行效率。

2.3 use_amp = True

batch size = 70

推薦閱讀
國(guó)產(chǎn)小眾瀏覽器因屏蔽視頻廣告，被索賠100萬（后續(xù)）
年輕人“不講武德”：因看黃片上癮，把網(wǎng)站和786名女主播起訴了
中國(guó)聯(lián)通官網(wǎng)被發(fā)現(xiàn)含木馬腳本，可向用戶推廣色情APP
張一鳴：每個(gè)逆襲的年輕人，都具備的底層能力

關(guān)于程序員大白

程序員大白是一群哈工大，東北大學(xué)，西湖大學(xué)和上海交通大學(xué)的碩士博士運(yùn)營(yíng)維護(hù)的號(hào)，大家樂于分享高質(zhì)量文章，喜歡總結(jié)知識(shí)，歡迎關(guān)注[程序員大白]，大家一起學(xué)習(xí)進(jìn)步！

Pytorch mixed precision 概述（混合精度）