基于TensorRT量化部署YOLOV5s 4.0模型
【GiantPandaCV導(dǎo)語(yǔ)】本文為大家介紹了一個(gè)TensorRT int8 量化部署 yolov5s 4.0 模型的教程,并開(kāi)源了全部代碼。主要是教你如何搭建tensorrt環(huán)境,對(duì)pytorch模型做onnx格式轉(zhuǎn)換,onnx模型做tensorrt int8量化,及對(duì)量化后的模型做推理,實(shí)測(cè)在1070顯卡做到了3.3ms一幀!開(kāi)源地址如下:https://github.com/Wulingtian/yolov5_tensorrt_int8_tools,https://github.com/Wulingtian/yolov5_tensorrt_int8。歡迎star。
0x0. YOLOV5簡(jiǎn)介
如果說(shuō)在目標(biāo)檢測(cè)領(lǐng)域落地最廣的算法,yolo系列當(dāng)之無(wú)愧,從yolov1到現(xiàn)在的"yolov5",雖然yolov5這個(gè)名字飽受爭(zhēng)議,但是阻止不了算法部署工程師對(duì)他的喜愛(ài),因?yàn)樗_實(shí)又快又好,從kaggle全球小麥檢測(cè)競(jìng)賽霸榜,到star數(shù)短短不到一年突破8k,無(wú)疑,用硬實(shí)力證明了自己??偠灾盟?,用它,用它?。?strong>在我的1070顯卡上,yolov5s 4.0 的模型 tensorrt int8 量化后,inference做到了3.3ms一幀!)

0x1. 環(huán)境配置
ubuntu:18.04 cuda:11.0 cudnn:8.0 tensorrt:7.2.16 OpenCV:3.4.2 cuda,cudnn,tensorrt和OpenCV安裝包(編譯好了,也可以自己從官網(wǎng)下載編譯)可以從鏈接: https://pan.baidu.com/s/1dpMRyzLivnBAca2c_DIgGw 密碼: 0rct cuda安裝 如果系統(tǒng)有安裝驅(qū)動(dòng),運(yùn)行如下命令卸載 sudo apt-get purge nvidia* 禁用nouveau,運(yùn)行如下命令 sudo vim /etc/modprobe.d/blacklist.conf 在末尾添加 blacklist nouveau然后執(zhí)行 sudo update-initramfs -u, chmod +x cuda_11.0.2_450.51.05_linux.run,sudo ./cuda_11.0.2_450.51.05_linux.run是否接受協(xié)議: accept 然后選擇Install 最后回車 vim ~/.bashrc 添加如下內(nèi)容: export PATH=/usr/local/cuda-11.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH source .bashrc 激活環(huán)境 cudnn 安裝 tar -xzvf cudnn-11.0-linux-x64-v8.0.4.30.tgz cd cuda/include sudo cp *.h /usr/local/cuda-11.0/include cd cuda/lib64 sudo cp libcudnn* /usr/local/cuda-11.0/lib64 tensorrt及OpenCV安裝 定位到用戶根目錄 tar -xzvf TensorRT-7.2.1.6.Ubuntu-18.04.x86_64-gnu.cuda-11.0.cudnn8.0.tar.gz cd TensorRT-7.2.1.6/python,該目錄有4個(gè)python版本的tensorrt安裝包 sudo pip3 install tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl(根據(jù)自己的python版本安裝) pip install pycuda 安裝python版本的cuda 定位到用戶根目錄 tar -xzvf opencv-3.4.2.zip 以備推理調(diào)用
0x2. yolov5s導(dǎo)出onnx
pip install onnx pip install onnx-simplifier git clone https://github.com/ultralytics/yolov5.git cd yolov5/models vim common.py 把BottleneckCSP類下的激活函數(shù)替換為relu,tensorrt對(duì)leakyRelu int8量化不穩(wěn)定(這是一個(gè)深坑,大家記得避開(kāi))即修改為self.act = nn.ReLU(inplace=True) 訓(xùn)練得到模型后 cd yolov5 python models/export.py --weights 訓(xùn)練得到的模型權(quán)重路徑 --img-size 訓(xùn)練圖片輸入尺寸 python3 -m onnxsim onnx模型名稱 yolov5s-simple.onnx 得到最終簡(jiǎn)化后的onnx模型
0x3. ONNX模型轉(zhuǎn)換為 int8 TensorRT引擎
git clone https://github.com/Wulingtian/yolov5_tensorrt_int8_tools.git(求star) cd yolov5_tensorrt_int8_tools vim convert_trt_quant.py 修改如下參數(shù) BATCH_SIZE 模型量化一次輸入多少?gòu)垐D片 BATCH 模型量化次數(shù) height width 輸入圖片寬和高 CALIB_IMG_DIR 訓(xùn)練圖片路徑,用于量化 onnx_model_path onnx模型路徑 python convert_trt_quant.py 量化后的模型存到models_save目錄下
0x4. TensorRT模型推理
git clone https://github.com/Wulingtian/yolov5_tensorrt_int8.git(求star)
cd yolov5_tensorrt_int8
vim CMakeLists.txt
修改USER_DIR參數(shù)為自己的用戶根目錄
vim yolov5s_infer.cc 修改如下參數(shù)
output_name1 output_name2 output_name3 (yolov5模型有3個(gè)輸出)
我們可以通過(guò)netron查看模型輸出名
pip install netron 安裝netron
vim netron_yolov5s.py 把如下內(nèi)容粘貼
import netron netron.start('此處填充簡(jiǎn)化后的onnx模型路徑', port=3344) python netron_yolov5s.py 即可查看 模型輸出名
trt_model_path 量化的的tensorrt推理引擎(models_save目錄下trt后綴的文件)
test_img 測(cè)試圖片路徑
INPUT_W INPUT_H 輸入圖片寬高
NUM_CLASS 訓(xùn)練的模型有多少類
NMS_THRESH nms閾值
CONF_THRESH 置信度
參數(shù)配置完畢,開(kāi)始編譯運(yùn)行
mkdir build cd build cmake .. make ./YoloV5sEngine 輸出平均推理時(shí)間,以及保存預(yù)測(cè)圖片到當(dāng)前目錄下,至此,部署完成!
0x5. TensorRT int8 量化核心代碼一覽
//量化預(yù)處理與訓(xùn)練保持一致,數(shù)據(jù)對(duì)齊
def?preprocess_v1(image_raw):
????h,?w,?c?=?image_raw.shape
????image?=?cv2.cvtColor(image_raw,?cv2.COLOR_BGR2RGB)
????#?Calculate?widht?and?height?and?paddings
????r_w?=?width?/?w
????r_h?=?height?/?h
????if?r_h?>?r_w:
????????tw?=?width
????????th?=?int(r_w?*?h)
????????tx1?=?tx2?=?0
????????ty1?=?int((height?-?th)?/?2)
????????ty2?=?height?-?th?-?ty1
????else:
????????tw?=?int(r_h?*?w)
????????th?=?height
????????tx1?=?int((width?-?tw)?/?2)
????????tx2?=?width?-?tw?-?tx1
????????ty1?=?ty2?=?0
????#?Resize?the?image?with?long?side?while?maintaining?ratio
????image?=?cv2.resize(image,?(tw,?th))
????#?Pad?the?short?side?with?(128,128,128)
????image?=?cv2.copyMakeBorder(
????????image,?ty1,?ty2,?tx1,?tx2,?cv2.BORDER_CONSTANT,?(128,?128,?128)
????)
????image?=?image.astype(np.float32)
????#?Normalize?to?[0,1]
????image?/=?255.0
????#?HWC?to?CHW?format:
????image?=?np.transpose(image,?[2,?0,?1])
????#?CHW?to?NCHW?format
????#image?=?np.expand_dims(image,?axis=0)
????#?Convert?the?image?to?row-major?order,?also?known?as?"C?order":
????#image?=?np.ascontiguousarray(image)
????return?image
//構(gòu)建IInt8EntropyCalibrator量化器
class?Calibrator(trt.IInt8EntropyCalibrator):
????def?__init__(self,?stream,?cache_file=""):
????????trt.IInt8EntropyCalibrator.__init__(self)???????
????????self.stream?=?stream
????????self.d_input?=?cuda.mem_alloc(self.stream.calibration_data.nbytes)
????????self.cache_file?=?cache_file
????????stream.reset()
????def?get_batch_size(self):
????????return?self.stream.batch_size
????def?get_batch(self,?names):
????????batch?=?self.stream.next_batch()
????????if?not?batch.size:???
????????????return?None
????????cuda.memcpy_htod(self.d_input,?batch)
????????return?[int(self.d_input)]
????def?read_calibration_cache(self):
????????#?If?there?is?a?cache,?use?it?instead?of?calibrating?again.?Otherwise,?implicitly?return?None.
????????if?os.path.exists(self.cache_file):
????????????with?open(self.cache_file,?"rb")?as?f:
????????????????logger.info("Using?calibration?cache?to?save?time:?{:}".format(self.cache_file))
????????????????return?f.read()
????def?write_calibration_cache(self,?cache):
????????with?open(self.cache_file,?"wb")?as?f:
????????????logger.info("Caching?calibration?data?for?future?use:?{:}".format(self.cache_file))
????????????f.write(cache)
//加載onnx模型,構(gòu)建tensorrt?engine
def?get_engine(max_batch_size=1,?onnx_file_path="",?engine_file_path="",\
???????????????fp16_mode=False,?int8_mode=False,?calibration_stream=None,?calibration_table_path="",?save_engine=False):
????"""Attempts?to?load?a?serialized?engine?if?available,?otherwise?builds?a?new?TensorRT?engine?and?saves?it."""
????def?build_engine(max_batch_size,?save_engine):
????????"""Takes?an?ONNX?file?and?creates?a?TensorRT?engine?to?run?inference?with"""
????????with?trt.Builder(TRT_LOGGER)?as?builder,?\
????????????????builder.create_network(1)?as?network,\
????????????????trt.OnnxParser(network,?TRT_LOGGER)?as?parser:
????????????
????????????#?parse?onnx?model?file
????????????if?not?os.path.exists(onnx_file_path):
????????????????quit('ONNX?file?{}?not?found'.format(onnx_file_path))
????????????print('Loading?ONNX?file?from?path?{}...'.format(onnx_file_path))
????????????with?open(onnx_file_path,?'rb')?as?model:
????????????????print('Beginning?ONNX?file?parsing')
????????????????parser.parse(model.read())
????????????????assert?network.num_layers?>?0,?'Failed?to?parse?ONNX?model.?\
????????????????????????????Please?check?if?the?ONNX?model?is?compatible?'
????????????print('Completed?parsing?of?ONNX?file')
????????????print('Building?an?engine?from?file?{};?this?may?take?a?while...'.format(onnx_file_path))????????
????????????
????????????#?build?trt?engine
????????????builder.max_batch_size?=?max_batch_size
????????????builder.max_workspace_size?=?1?<30?#?1GB
????????????builder.fp16_mode?=?fp16_mode
????????????if?int8_mode:
????????????????builder.int8_mode?=?int8_mode
????????????????assert?calibration_stream,?'Error:?a?calibration_stream?should?be?provided?for?int8?mode'
????????????????builder.int8_calibrator??=?Calibrator(calibration_stream,?calibration_table_path)
????????????????print('Int8?mode?enabled')
????????????engine?=?builder.build_cuda_engine(network)?
????????????if?engine?is?None:
????????????????print('Failed?to?create?the?engine')
????????????????return?None???
????????????print("Completed?creating?the?engine")
????????????if?save_engine:
????????????????with?open(engine_file_path,?"wb")?as?f:
????????????????????f.write(engine.serialize())
????????????return?engine
????????
????if?os.path.exists(engine_file_path):
????????#?If?a?serialized?engine?exists,?load?it?instead?of?building?a?new?one.
????????print("Reading?engine?from?file?{}".format(engine_file_path))
????????with?open(engine_file_path,?"rb")?as?f,?trt.Runtime(TRT_LOGGER)?as?runtime:
????????????return?runtime.deserialize_cuda_engine(f.read())
????else:
????????return?build_engine(max_batch_size,?save_engine)
0x6. TensorRT inference 核心代碼一覽
//數(shù)據(jù)預(yù)處理和量化預(yù)處理保持一致,故不做展示
//對(duì)模型的三個(gè)輸出進(jìn)行解析,生成返回模型預(yù)測(cè)的bboxes信息
void?postProcessParall(const?int?height,?const?int?width,?int?scale_idx,?float?postThres,?tensor_t?*?origin_output,?vector<int>?Strides,?vector?Anchors,?vector?*bboxes)
{
????Bbox?bbox;
????float?cx,?cy,?w_b,?h_b,?score;
????int?cid;
????const?float?*ptr?=?(float?*)origin_output->pValue;
????for(unsigned?long?a=0;?a<3;?++a){
????????for(unsigned?long?h=0;?h????????????for(unsigned?long?w=0;?w????????????????const?float?*cls_ptr?=??ptr?+?5;
????????????????cid?=?argmax(cls_ptr,?cls_ptr+NUM_CLASS);
????????????????score?=?sigmoid(ptr[4])?*?sigmoid(cls_ptr[cid]);
????????????????if(score>=postThres){
????????????????????cx?=?(sigmoid(ptr[0])?*?2.f?-?0.5f?+?static_cast<float>(w))?*?static_cast<float>(Strides[scale_idx]);
????????????????????cy?=?(sigmoid(ptr[1])?*?2.f?-?0.5f?+?static_cast<float>(h))?*?static_cast<float>(Strides[scale_idx]);
????????????????????w_b?=?powf(sigmoid(ptr[2])?*?2.f,?2)?*?Anchors[scale_idx?*?3?+?a].width;
????????????????????h_b?=?powf(sigmoid(ptr[3])?*?2.f,?2)?*?Anchors[scale_idx?*?3?+?a].height;
????????????????????bbox.xmin?=?clip(cx?-?w_b?/?2,?0.F,?static_cast<float>(INPUT_W?-?1));
????????????????????bbox.ymin?=?clip(cy?-?h_b?/?2,?0.f,?static_cast<float>(INPUT_H?-?1));
????????????????????bbox.xmax?=?clip(cx?+?w_b?/?2,?0.f,?static_cast<float>(INPUT_W?-?1));
????????????????????bbox.ymax?=?clip(cy?+?h_b?/?2,?0.f,?static_cast<float>(INPUT_H?-?1));
????????????????????bbox.score?=?score;
????????????????????bbox.cid?=?cid;
????????????????????//std::cout<"bbox.cid?:?"?<
????????????????????bboxes->push_back(bbox);
????????????????}
????????????????ptr?+=?5?+?NUM_CLASS;
????????????}
????????}
????}
}
0x7. 預(yù)測(cè)結(jié)果展示

在我的1070顯卡上,yolov5s 4.0 的模型 tensorrt int8 量化后,inference做到了3.3ms一幀!
歡迎關(guān)注GiantPandaCV, 在這里你將看到獨(dú)家的深度學(xué)習(xí)分享,堅(jiān)持原創(chuàng),每天分享我們學(xué)習(xí)到的新鮮知識(shí)。( ? ?ω?? )?
有對(duì)文章相關(guān)的問(wèn)題,或者想要加入交流群,歡迎添加BBuf微信:
為了方便讀者獲取資料以及我們公眾號(hào)的作者發(fā)布一些Github工程的更新,我們成立了一個(gè)QQ群,二維碼如下,感興趣可以加入。
