【GiantPandaCV導(dǎo)語(yǔ)】本文為大家介紹了一個(gè)TensorRT int8 量化部署 yolov5s 4.0 模型的教程，并開(kāi)源了全部代碼。主要是教你如何搭建tensorrt環(huán)境，對(duì)pytorch模型做onnx格式轉(zhuǎn)換，onnx模型做tensorrt int8量化，及對(duì)量化后的模型做推理，實(shí)測(cè)在1070顯卡做到了3.3ms一幀！開(kāi)源地址如下：https://github.com/Wulingtian/yolov5_tensorrt_int8_tools，https://github.com/Wulingtian/yolov5_tensorrt_int8。歡迎star。

0x0. YOLOV5簡(jiǎn)介

如果說(shuō)在目標(biāo)檢測(cè)領(lǐng)域落地最廣的算法，yolo系列當(dāng)之無(wú)愧，從yolov1到現(xiàn)在的"yolov5"，雖然yolov5這個(gè)名字飽受爭(zhēng)議，但是阻止不了算法部署工程師對(duì)他的喜愛(ài)，因?yàn)樗_實(shí)又快又好，從kaggle全球小麥檢測(cè)競(jìng)賽霸榜，到star數(shù)短短不到一年突破8k，無(wú)疑，用硬實(shí)力證明了自己?？偠灾盟?，用它，用它?。?strong>在我的1070顯卡上，yolov5s 4.0 的模型 tensorrt int8 量化后，inference做到了3.3ms一幀！）

0x1. 環(huán)境配置

ubuntu：18.04
cuda：11.0
cudnn：8.0
tensorrt：7.2.16
OpenCV：3.4.2
cuda，cudnn，tensorrt和OpenCV安裝包（編譯好了，也可以自己從官網(wǎng)下載編譯）可以從鏈接: https://pan.baidu.com/s/1dpMRyzLivnBAca2c_DIgGw 密碼: 0rct
cuda安裝

如果系統(tǒng)有安裝驅(qū)動(dòng)，運(yùn)行如下命令卸載
sudo apt-get purge nvidia*
禁用nouveau，運(yùn)行如下命令
sudo vim /etc/modprobe.d/blacklist.conf
在末尾添加 blacklist nouveau
然后執(zhí)行sudo update-initramfs -u， chmod +x cuda_11.0.2_450.51.05_linux.run，sudo ./cuda_11.0.2_450.51.05_linux.run
是否接受協(xié)議: accept
然后選擇Install
最后回車
vim ~/.bashrc 添加如下內(nèi)容：
export PATH=/usr/local/cuda-11.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH
source .bashrc 激活環(huán)境

cudnn 安裝

tar -xzvf cudnn-11.0-linux-x64-v8.0.4.30.tgz
cd cuda/include
sudo cp *.h /usr/local/cuda-11.0/include
cd cuda/lib64
sudo cp libcudnn* /usr/local/cuda-11.0/lib64

tensorrt及OpenCV安裝

定位到用戶根目錄
tar -xzvf TensorRT-7.2.1.6.Ubuntu-18.04.x86_64-gnu.cuda-11.0.cudnn8.0.tar.gz
cd TensorRT-7.2.1.6/python，該目錄有4個(gè)python版本的tensorrt安裝包
sudo pip3 install tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl（根據(jù)自己的python版本安裝）
pip install pycuda 安裝python版本的cuda
定位到用戶根目錄
tar -xzvf opencv-3.4.2.zip 以備推理調(diào)用

0x2. yolov5s導(dǎo)出onnx

pip install onnx
pip install onnx-simplifier
git clone https://github.com/ultralytics/yolov5.git
cd yolov5/models
vim common.py
把BottleneckCSP類下的激活函數(shù)替換為relu，tensorrt對(duì)leakyRelu int8量化不穩(wěn)定（這是一個(gè)深坑，大家記得避開(kāi)）即修改為self.act = nn.ReLU(inplace=True)
訓(xùn)練得到模型后
cd yolov5
python models/export.py --weights 訓(xùn)練得到的模型權(quán)重路徑 --img-size 訓(xùn)練圖片輸入尺寸
python3 -m onnxsim onnx模型名稱 yolov5s-simple.onnx 得到最終簡(jiǎn)化后的onnx模型

0x3. ONNX模型轉(zhuǎn)換為 int8 TensorRT引擎

git clone https://github.com/Wulingtian/yolov5_tensorrt_int8_tools.git（求star）
cd yolov5_tensorrt_int8_tools
vim convert_trt_quant.py 修改如下參數(shù)

BATCH_SIZE 模型量化一次輸入多少?gòu)垐D片
BATCH 模型量化次數(shù)
height width 輸入圖片寬和高
CALIB_IMG_DIR 訓(xùn)練圖片路徑，用于量化
onnx_model_path onnx模型路徑

python convert_trt_quant.py 量化后的模型存到models_save目錄下

0x4. TensorRT模型推理

git clone https://github.com/Wulingtian/yolov5_tensorrt_int8.git（求star）
cd yolov5_tensorrt_int8
vim CMakeLists.txt
修改USER_DIR參數(shù)為自己的用戶根目錄
vim yolov5s_infer.cc 修改如下參數(shù)
output_name1 output_name2 output_name3 (yolov5模型有3個(gè)輸出)
我們可以通過(guò)netron查看模型輸出名
pip install netron 安裝netron
vim netron_yolov5s.py 把如下內(nèi)容粘貼

import netron
netron.start('此處填充簡(jiǎn)化后的onnx模型路徑', port=3344)

python netron_yolov5s.py 即可查看模型輸出名
trt_model_path 量化的的tensorrt推理引擎（models_save目錄下trt后綴的文件）
test_img 測(cè)試圖片路徑
INPUT_W INPUT_H 輸入圖片寬高
NUM_CLASS 訓(xùn)練的模型有多少類
NMS_THRESH nms閾值
CONF_THRESH 置信度
參數(shù)配置完畢，開(kāi)始編譯運(yùn)行

mkdir build
cd build
cmake ..
make
./YoloV5sEngine

輸出平均推理時(shí)間，以及保存預(yù)測(cè)圖片到當(dāng)前目錄下，至此，部署完成！

0x5. TensorRT int8 量化核心代碼一覽

//量化預(yù)處理與訓(xùn)練保持一致，數(shù)據(jù)對(duì)齊
def?preprocess_v1(image_raw):
????h,?w,?c?=?image_raw.shape
????image?=?cv2.cvtColor(image_raw,?cv2.COLOR_BGR2RGB)
????#?Calculate?widht?and?height?and?paddings
????r_w?=?width?/?w
????r_h?=?height?/?h
????if?r_h?>?r_w:
????????tw?=?width
????????th?=?int(r_w?*?h)
????????tx1?=?tx2?=?0
????????ty1?=?int((height?-?th)?/?2)
????????ty2?=?height?-?th?-?ty1
????else:
????????tw?=?int(r_h?*?w)
????????th?=?height
????????tx1?=?int((width?-?tw)?/?2)
????????tx2?=?width?-?tw?-?tx1
????????ty1?=?ty2?=?0
????#?Resize?the?image?with?long?side?while?maintaining?ratio
????image?=?cv2.resize(image,?(tw,?th))
????#?Pad?the?short?side?with?(128,128,128)
????image?=?cv2.copyMakeBorder(
????????image,?ty1,?ty2,?tx1,?tx2,?cv2.BORDER_CONSTANT,?(128,?128,?128)
????)
????image?=?image.astype(np.float32)
????#?Normalize?to?[0,1]
????image?/=?255.0
????#?HWC?to?CHW?format:
????image?=?np.transpose(image,?[2,?0,?1])
????#?CHW?to?NCHW?format
????#image?=?np.expand_dims(image,?axis=0)
????#?Convert?the?image?to?row-major?order,?also?known?as?"C?order":
????#image?=?np.ascontiguousarray(image)
????return?image

//構(gòu)建IInt8EntropyCalibrator量化器
class?Calibrator(trt.IInt8EntropyCalibrator):
????def?__init__(self,?stream,?cache_file=""):
????????trt.IInt8EntropyCalibrator.__init__(self)???????
????????self.stream?=?stream
????????self.d_input?=?cuda.mem_alloc(self.stream.calibration_data.nbytes)
????????self.cache_file?=?cache_file
????????stream.reset()

????def?get_batch_size(self):
????????return?self.stream.batch_size

????def?get_batch(self,?names):
????????batch?=?self.stream.next_batch()
????????if?not?batch.size:???
????????????return?None

????????cuda.memcpy_htod(self.d_input,?batch)

????????return?[int(self.d_input)]

????def?read_calibration_cache(self):
????????#?If?there?is?a?cache,?use?it?instead?of?calibrating?again.?Otherwise,?implicitly?return?None.
????????if?os.path.exists(self.cache_file):
????????????with?open(self.cache_file,?"rb")?as?f:
????????????????logger.info("Using?calibration?cache?to?save?time:?{:}".format(self.cache_file))
????????????????return?f.read()

????def?write_calibration_cache(self,?cache):
????????with?open(self.cache_file,?"wb")?as?f:
????????????logger.info("Caching?calibration?data?for?future?use:?{:}".format(self.cache_file))
????????????f.write(cache)

//加載onnx模型，構(gòu)建tensorrt?engine
def?get_engine(max_batch_size=1,?onnx_file_path="",?engine_file_path="",\
???????????????fp16_mode=False,?int8_mode=False,?calibration_stream=None,?calibration_table_path="",?save_engine=False):
????"""Attempts?to?load?a?serialized?engine?if?available,?otherwise?builds?a?new?TensorRT?engine?and?saves?it."""
????def?build_engine(max_batch_size,?save_engine):
????????"""Takes?an?ONNX?file?and?creates?a?TensorRT?engine?to?run?inference?with"""
????????with?trt.Builder(TRT_LOGGER)?as?builder,?\
????????????????builder.create_network(1)?as?network,\
????????????????trt.OnnxParser(network,?TRT_LOGGER)?as?parser:
????????????
????????????#?parse?onnx?model?file
????????????if?not?os.path.exists(onnx_file_path):
????????????????quit('ONNX?file?{}?not?found'.format(onnx_file_path))
????????????print('Loading?ONNX?file?from?path?{}...'.format(onnx_file_path))
????????????with?open(onnx_file_path,?'rb')?as?model:
????????????????print('Beginning?ONNX?file?parsing')
????????????????parser.parse(model.read())
????????????????assert?network.num_layers?>?0,?'Failed?to?parse?ONNX?model.?\
????????????????????????????Please?check?if?the?ONNX?model?is?compatible?'
????????????print('Completed?parsing?of?ONNX?file')
????????????print('Building?an?engine?from?file?{};?this?may?take?a?while...'.format(onnx_file_path))????????
????????????
????????????#?build?trt?engine
????????????builder.max_batch_size?=?max_batch_size
????????????builder.max_workspace_size?=?1?<30?#?1GB
????????????builder.fp16_mode?=?fp16_mode
????????????if?int8_mode:
????????????????builder.int8_mode?=?int8_mode
????????????????assert?calibration_stream,?'Error:?a?calibration_stream?should?be?provided?for?int8?mode'
????????????????builder.int8_calibrator??=?Calibrator(calibration_stream,?calibration_table_path)
????????????????print('Int8?mode?enabled')
????????????engine?=?builder.build_cuda_engine(network)?
????????????if?engine?is?None:
????????????????print('Failed?to?create?the?engine')
????????????????return?None???
????????????print("Completed?creating?the?engine")
????????????if?save_engine:
????????????????with?open(engine_file_path,?"wb")?as?f:
????????????????????f.write(engine.serialize())
????????????return?engine
????????
????if?os.path.exists(engine_file_path):
????????#?If?a?serialized?engine?exists,?load?it?instead?of?building?a?new?one.
????????print("Reading?engine?from?file?{}".format(engine_file_path))
????????with?open(engine_file_path,?"rb")?as?f,?trt.Runtime(TRT_LOGGER)?as?runtime:
????????????return?runtime.deserialize_cuda_engine(f.read())
????else:
????????return?build_engine(max_batch_size,?save_engine)

0x6. TensorRT inference 核心代碼一覽

//數(shù)據(jù)預(yù)處理和量化預(yù)處理保持一致，故不做展示
//對(duì)模型的三個(gè)輸出進(jìn)行解析，生成返回模型預(yù)測(cè)的bboxes信息
void?postProcessParall(const?int?height,?const?int?width,?int?scale_idx,?float?postThres,?tensor_t?*?origin_output,?vector<int>?Strides,?vector?Anchors,?vector?*bboxes)
{
????Bbox?bbox;
????float?cx,?cy,?w_b,?h_b,?score;
????int?cid;
????const?float?*ptr?=?(float?*)origin_output->pValue;
????for(unsigned?long?a=0;?a<3;?++a){
????????for(unsigned?long?h=0;?h????????????for(unsigned?long?w=0;?w????????????????const?float?*cls_ptr?=??ptr?+?5;
????????????????cid?=?argmax(cls_ptr,?cls_ptr+NUM_CLASS);
????????????????score?=?sigmoid(ptr[4])?*?sigmoid(cls_ptr[cid]);
????????????????if(score>=postThres){
????????????????????cx?=?(sigmoid(ptr[0])?*?2.f?-?0.5f?+?static_cast<float>(w))?*?static_cast<float>(Strides[scale_idx]);
????????????????????cy?=?(sigmoid(ptr[1])?*?2.f?-?0.5f?+?static_cast<float>(h))?*?static_cast<float>(Strides[scale_idx]);
????????????????????w_b?=?powf(sigmoid(ptr[2])?*?2.f,?2)?*?Anchors[scale_idx?*?3?+?a].width;
????????????????????h_b?=?powf(sigmoid(ptr[3])?*?2.f,?2)?*?Anchors[scale_idx?*?3?+?a].height;
????????????????????bbox.xmin?=?clip(cx?-?w_b?/?2,?0.F,?static_cast<float>(INPUT_W?-?1));
????????????????????bbox.ymin?=?clip(cy?-?h_b?/?2,?0.f,?static_cast<float>(INPUT_H?-?1));
????????????????????bbox.xmax?=?clip(cx?+?w_b?/?2,?0.f,?static_cast<float>(INPUT_W?-?1));
????????????????????bbox.ymax?=?clip(cy?+?h_b?/?2,?0.f,?static_cast<float>(INPUT_H?-?1));
????????????????????bbox.score?=?score;
????????????????????bbox.cid?=?cid;
????????????????????//std::cout<
????????????????????bboxes->push_back(bbox);
????????????????}
????????????????ptr?+=?5?+?NUM_CLASS;
????????????}
????????}
????}
}

0x7. 預(yù)測(cè)結(jié)果展示

預(yù)測(cè)結(jié)果展示

在我的1070顯卡上，yolov5s 4.0 的模型 tensorrt int8 量化后，inference做到了3.3ms一幀！

歡迎關(guān)注GiantPandaCV, 在這里你將看到獨(dú)家的深度學(xué)習(xí)分享，堅(jiān)持原創(chuàng)，每天分享我們學(xué)習(xí)到的新鮮知識(shí)。( ? ?ω?? )?

有對(duì)文章相關(guān)的問(wèn)題，或者想要加入交流群，歡迎添加BBuf微信：

二維碼

為了方便讀者獲取資料以及我們公眾號(hào)的作者發(fā)布一些Github工程的更新，我們成立了一個(gè)QQ群，二維碼如下，感興趣可以加入。

公眾號(hào)QQ交流群

基于TensorRT量化部署YOLOV5s 4.0模型