<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          如何使用TensorRT對(duì)訓(xùn)練好的PyTorch模型進(jìn)行加速?

          共 6510字,需瀏覽 14分鐘

           ·

          2021-01-15 03:34

          ↑ 點(diǎn)擊藍(lán)字?關(guān)注極市平臺(tái)

          作者丨伯恩legacy
          來源丨h(huán)ttps://zhuanlan.zhihu.com/p/88318324
          編輯丨極市平臺(tái)

          極市導(dǎo)讀

          ?

          在python和c++兩種語言環(huán)境,將pytorch模型轉(zhuǎn)化為tensorRT,能夠幫助剛接觸TensorRT的同學(xué)們快速上手。?>>加入極市CV技術(shù)交流群,走在計(jì)算機(jī)視覺的最前沿

          一.簡(jiǎn)介

          TensorRT是Nvidia公司出的能加速模型推理的框架,其實(shí)就是讓你訓(xùn)練的模型在測(cè)試階段的速度加快,比如你的模型測(cè)試一張圖片的速度是50ms,那么用tensorRT加速的話,可能只需要10ms。當(dāng)然具體能加速多少也不能保證,反正確實(shí)速度能提升不少。但是TensorRT坑爹的地方在于,有些模型操作是不支持的、又或者就算支持但是支持并不完善,對(duì)于這些難題,要么自己寫插件,要么就只能等待官方的更新了。

          現(xiàn)在我們訓(xùn)練深度學(xué)習(xí)模型主流的框架有tensorflow,pytorch,mxnet,caffe等。這個(gè)貼子只涉及pytorch,對(duì)于tensorflow的話,可以參考TensorRT部署深度學(xué)習(xí)模型,https://zhuanlan.zhihu.com/p/84125533,這個(gè)帖子是c++如何部署TensorRT。其實(shí)原理都是一樣的,對(duì)于tensorflow模型,需要把pb模型轉(zhuǎn)化為uff模型;對(duì)于pytorch模型,需要把pth模型轉(zhuǎn)化為onnx模型;對(duì)于caffe模型,則不需要轉(zhuǎn)化,因?yàn)閠ensorRT是可以直接讀取caffe模型的。mxnet模型也是需要轉(zhuǎn)化為onnx的。

          那么,這篇教學(xué)貼主要是從python和c++兩種語言環(huán)境下,嘗試將pytorch模型轉(zhuǎn)化為tensorRT,教剛接觸TensorRT的同學(xué)們?nèi)绾慰焖偕鲜帧?/p>

          二.TensorRT的安裝

          TensorRT的安裝并不難,推薦安裝最新版本的。由于我使用的是Centos,因此我一般是按照這個(gè)教程來安裝TensorRT的。

          CentOS安裝TensorRT指南
          https://tbr8.org/how-to-install-tensorrt-on-centos/

          安裝完成后,在python環(huán)境下import tensorrt看能不能成功,并且編譯一下官方的sampleMnist的例子,如果都可以的話,就安裝成功了。

          python環(huán)境下,成功導(dǎo)入tensorrt

          運(yùn)行官方的mnist例子

          三.Python環(huán)境下pytorch模型如何轉(zhuǎn)化為TensorRT

          python環(huán)境下pytorch模型轉(zhuǎn)化為TensorRT有兩種路徑,一種是先把pytorch的pt模型轉(zhuǎn)化為onnx,然后再轉(zhuǎn)化為TensorRT;另一種是直接把pytorch的pt模型轉(zhuǎn)成TensorRT。

          首先,我們先把pt模型轉(zhuǎn)化為onnx模型,需要安裝onnx,直接pip install onnx即可。我們以ResNet50為例,代碼如下:

          import torchvisionimport torchfrom torch.autograd import Variableimport onnxprint(torch.__version__)
          input_name = ['input']output_name = ['output']input = Variable(torch.randn(1, 3, 224, 224)).cuda()model = torchvision.models.resnet50(pretrained=True).cuda()torch.onnx.export(model, input, 'resnet50.onnx', input_names=input_name, output_names=output_name, verbose=True)

          以上代碼使用torchvision里面預(yù)訓(xùn)練的resnet50模型為基礎(chǔ),將resnet50的pt模型轉(zhuǎn)化成res50.onnx,其中規(guī)定onnx的輸入名是'input',輸出名是'output',輸入圖像的大小是3通道224x224。其中batch size是1,其實(shí)這個(gè)batch size你可以取3、4、5等。運(yùn)行這個(gè)代碼就可以生成一個(gè)名為resnet50.onnx文件。

          最好檢查一下生成的onnx,代碼如下:

          test = onnx.load('resnet50.onnx')
          onnx.checker.check_model(test)
          print("==> Passed")

          接下來比較一下pytorch模型和TensorRT的結(jié)果吧:

          import pycuda.autoinitimport numpy as npimport pycuda.driver as cudaimport tensorrt as trtimport torchimport osimport timefrom PIL import Imageimport cv2import torchvision
          filename = 'test.jpg'max_batch_size = 1onnx_model_path = 'resnet50.onnx'
          TRT_LOGGER = trt.Logger() # This logger is required to build an engine

          def get_img_np_nchw(filename): image = cv2.imread(filename) image_cv = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image_cv = cv2.resize(image_cv, (224, 224)) miu = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) img_np = np.array(image_cv, dtype=float) / 255. r = (img_np[:, :, 0] - miu[0]) / std[0] g = (img_np[:, :, 1] - miu[1]) / std[1] b = (img_np[:, :, 2] - miu[2]) / std[2] img_np_t = np.array([r, g, b]) img_np_nchw = np.expand_dims(img_np_t, axis=0) return img_np_nchw
          class HostDeviceMem(object): def __init__(self, host_mem, device_mem): """Within this context, host_mom means the cpu memory and device means the GPU memory """ self.host = host_mem self.device = device_mem
          def __str__(self): return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
          def __repr__(self): return self.__str__()

          def allocate_buffers(engine): inputs = [] outputs = [] bindings = [] stream = cuda.Stream() for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding)) # Allocate host and device buffers host_mem = cuda.pagelocked_empty(size, dtype) device_mem = cuda.mem_alloc(host_mem.nbytes) # Append the device buffer to device bindings. bindings.append(int(device_mem)) # Append to the appropriate list. if engine.binding_is_input(binding): inputs.append(HostDeviceMem(host_mem, device_mem)) else: outputs.append(HostDeviceMem(host_mem, device_mem)) return inputs, outputs, bindings, stream

          def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="", \ fp16_mode=False, int8_mode=False, save_engine=False, ): """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
          def build_engine(max_batch_size, save_engine): """Takes an ONNX file and creates a TensorRT engine to run inference with""" with trt.Builder(TRT_LOGGER) as builder, \ builder.create_network() as network, \ trt.OnnxParser(network, TRT_LOGGER) as parser:
          builder.max_workspace_size = 1 << 30 # Your workspace size builder.max_batch_size = max_batch_size # pdb.set_trace() builder.fp16_mode = fp16_mode # Default: False builder.int8_mode = int8_mode # Default: False if int8_mode: # To be updated raise NotImplementedError
          # Parse model file if not os.path.exists(onnx_file_path): quit('ONNX file {} not found'.format(onnx_file_path))
          print('Loading ONNX file from path {}...'.format(onnx_file_path)) with open(onnx_file_path, 'rb') as model: print('Beginning ONNX file parsing') parser.parse(model.read())
          print('Completed parsing of ONNX file') print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
          engine = builder.build_cuda_engine(network) print("Completed creating Engine")
          if save_engine: with open(engine_file_path, "wb") as f: f.write(engine.serialize()) return engine
          if os.path.exists(engine_file_path): # If a serialized engine exists, load it instead of building a new one. print("Reading engine from file {}".format(engine_file_path)) with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: return runtime.deserialize_cuda_engine(f.read()) else: return build_engine(max_batch_size, save_engine)

          def do_inference(context, bindings, inputs, outputs, stream, batch_size=1): # Transfer data from CPU to the GPU. [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] # Run inference. context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle) # Transfer predictions back from the GPU. [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] # Synchronize the stream stream.synchronize() # Return only the host outputs. return [out.host for out in outputs]

          def postprocess_the_outputs(h_outputs, shape_of_output): h_outputs = h_outputs.reshape(*shape_of_output) return h_outputs


          img_np_nchw = get_img_np_nchw(filename)img_np_nchw = img_np_nchw.astype(dtype=np.float32)
          # These two modes are dependent on hardwaresfp16_mode = Falseint8_mode = Falsetrt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)# Build an engineengine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode)# Create the context for this enginecontext = engine.create_execution_context()# Allocate buffers for input and outputinputs, outputs, bindings, stream = allocate_buffers(engine) # input, output: host # bindings
          # Do inferenceshape_of_output = (max_batch_size, 1000)# Load data to the bufferinputs[0].host = img_np_nchw.reshape(-1)
          # inputs[1].host = ... for multiple inputt1 = time.time()trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream) # numpy datat2 = time.time()feat = postprocess_the_outputs(trt_outputs[0], shape_of_output)
          print('TensorRT ok')
          model = torchvision.models.resnet50(pretrained=True).cuda()resnet_model = model.eval()
          input_for_torch = torch.from_numpy(img_np_nchw).cuda()t3 = time.time()feat_2= resnet_model(input_for_torch)t4 = time.time()feat_2 = feat_2.cpu().data.numpy()print('Pytorch ok!')

          mse = np.mean((feat - feat_2)**2)print("Inference time with the TensorRT engine: {}".format(t2-t1))print("Inference time with the PyTorch model: {}".format(t4-t3))print('MSE Error = {}'.format(mse))
          print('All completed!')

          運(yùn)行結(jié)果如下:

          TensorRT okPytorch ok!Inference time with the TensorRT engine: 0.0037250518798828125Inference time with the PyTorch model: 0.3574800491333008MSE Error = 3.297184357139993e-12

          這個(gè)結(jié)果Pytorch模型ResNet50竟然需要340ms,感覺有些迷,但是好像沒發(fā)現(xiàn)有啥問題。可以發(fā)現(xiàn),TensorRT進(jìn)行inference的結(jié)果和pytorch前向的結(jié)果差距很小。代碼來源于https://github.com/RizhaoCai/PyTorch_ONNX_TensorRT

          接下來介紹python環(huán)境下,直接把pytorch模型轉(zhuǎn)化為TensorRT,參考的代碼來源于NVIDIA-AI-IOT/torch2trt,https://github.com/NVIDIA-AI-IOT/torch2trt這個(gè)工程比較簡(jiǎn)單易懂,質(zhì)量很高,安裝也不難,我自己運(yùn)行的結(jié)果如下:

          對(duì)于你自己的Pytorch模型,只需要把該代碼的model進(jìn)行替換即可。注意在運(yùn)行過程中經(jīng)常會(huì)出現(xiàn)"output tensor has no attribute _trt",這是因?yàn)槟隳P彤?dāng)中有一些操作還沒有實(shí)現(xiàn),需要自己實(shí)現(xiàn)。

          四.C++環(huán)境下Pytorch模型如何轉(zhuǎn)化為TensorRT

          c++環(huán)境下,以TensorRT5.1.5.0的sampleOnnxMNIST為例子,用opencv讀取一張圖片,然后讓TensorRT進(jìn)行doInference輸出(1,1000)的特征。代碼如下所示,把這個(gè)代碼替換sampleOnnxMNIST替換,然后編譯就能運(yùn)行了。

          #include #include #include #include #include #include #include #include #include #include #include #include "NvInfer.h"#include "NvOnnxParser.h"#include "argsParser.h"#include "logger.h"#include "common.h"#include "image.hpp"#define DebugP(x) std::cout << "Line" << __LINE__ << "  " << #x << "=" << x << std::endl

          using namespace nvinfer1;
          static const int INPUT_H = 224;static const int INPUT_W = 224;static const int INPUT_C = 3;static const int OUTPUT_SIZE = 1000;
          const char* INPUT_BLOB_NAME = "input";const char* OUTPUT_BLOB_NAME = "output";
          const std::string gSampleName = "TensorRT.sample_onnx_image";

          samplesCommon::Args gArgs;

          bool onnxToTRTModel(const std::string& modelFile, // name of the onnx model unsigned int maxBatchSize, // batch size - NB must be at least as large as the batch we want to run with IHostMemory*& trtModelStream) // output buffer for the TensorRT model{ // create the builder IBuilder* builder = createInferBuilder(gLogger.getTRTLogger()); assert(builder != nullptr); nvinfer1::INetworkDefinition* network = builder->createNetwork();
          auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());
          //Optional - uncomment below lines to view network layer information //config->setPrintLayerInfo(true); //parser->reportParsingInfo();
          if ( !parser->parseFromFile( locateFile(modelFile, gArgs.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity()) ) ) { gLogError << "Failure while parsing ONNX file" << std::endl; return false; } // Build the engine builder->setMaxBatchSize(maxBatchSize); //builder->setMaxWorkspaceSize(1 << 20); builder->setMaxWorkspaceSize(10 << 20); builder->setFp16Mode(gArgs.runInFp16); builder->setInt8Mode(gArgs.runInInt8);
          if (gArgs.runInInt8) { samplesCommon::setAllTensorScales(network, 127.0f, 127.0f); } samplesCommon::enableDLA(builder, gArgs.useDLACore); ICudaEngine* engine = builder->buildCudaEngine(*network); assert(engine);
          // we can destroy the parser parser->destroy();
          // serialize the engine, then close everything down trtModelStream = engine->serialize(); engine->destroy(); network->destroy(); builder->destroy();
          return true;}
          void doInference(IExecutionContext& context, float* input, float* output, int batchSize){ const ICudaEngine& engine = context.getEngine(); // input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(), // of these, but in this case we know that there is exactly one input and one output. assert(engine.getNbBindings() == 2); void* buffers[2];
          // In order to bind the buffers, we need to know the names of the input and output tensors. // note that indices are guaranteed to be less than IEngine::getNbBindings() const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME); const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME); DebugP(inputIndex); DebugP(outputIndex); // create GPU buffers and a stream CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_C * INPUT_H * INPUT_W * sizeof(float))); CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));
          cudaStream_t stream; CHECK(cudaStreamCreate(&stream));
          // DMA the input to the GPU, execute the batch asynchronously, and DMA it back: CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_C * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream)); context.enqueue(batchSize, buffers, stream, nullptr); CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream)); cudaStreamSynchronize(stream);
          // release the stream and the buffers cudaStreamDestroy(stream); CHECK(cudaFree(buffers[inputIndex])); CHECK(cudaFree(buffers[outputIndex]));}
          //!//! \brief This function prints the help information for running this sample//!void printHelpInfo(){ std::cout << "Usage: ./sample_onnx_mnist [-h or --help] [-d or --datadir=] [--useDLACore=]\n"; std::cout << "--help Display help information\n"; std::cout << "--datadir Specify path to a data directory, overriding the default. This option can be used multiple times to add multiple directories. If no data directories are given, the default is to use (data/samples/mnist/, data/mnist/)" << std::endl; std::cout << "--useDLACore=N Specify a DLA engine for layers that support DLA. Value can range from 0 to n-1, where n is the number of DLA engines on the platform." << std::endl; std::cout << "--int8 Run in Int8 mode.\n"; std::cout << "--fp16 Run in FP16 mode." << std::endl;}
          int main(int argc, char** argv){ bool argsOK = samplesCommon::parseArgs(gArgs, argc, argv); if (gArgs.help) { printHelpInfo(); return EXIT_SUCCESS; } if (!argsOK) { gLogError << "Invalid arguments" << std::endl; printHelpInfo(); return EXIT_FAILURE; } if (gArgs.dataDirs.empty()) { gArgs.dataDirs = std::vector<std::string>{"data/samples/mnist/", "data/mnist/"}; }
          auto sampleTest = gLogger.defineTest(gSampleName, argc, const_cast<const char**>(argv));
          gLogger.reportTestStart(sampleTest);
          // create a TensorRT model from the onnx model and serialize it to a stream IHostMemory* trtModelStream{nullptr};
          if (!onnxToTRTModel("resnet50.onnx", 1, trtModelStream)) gLogger.reportFail(sampleTest);
          assert(trtModelStream != nullptr); std::cout << "Successfully parsed ONNX file!!!!" << std::endl; std::cout << "Start reading the input image!!!!" << std::endl; cv::Mat image = cv::imread(locateFile("test.jpg", gArgs.dataDirs), cv::IMREAD_COLOR); if (image.empty()) { std::cout << "The input image is empty!!! Please check....."<<std::endl; } DebugP(image.size()); cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
          cv::Mat dst = cv::Mat::zeros(INPUT_H, INPUT_W, CV_32FC3); cv::resize(image, dst, dst.size()); DebugP(dst.size());
          float* data = normal(dst);
          // deserialize the engine IRuntime* runtime = createInferRuntime(gLogger); assert(runtime != nullptr); if (gArgs.useDLACore >= 0) { runtime->setDLACore(gArgs.useDLACore); }
          ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream->data(), trtModelStream->size(), nullptr); assert(engine != nullptr); trtModelStream->destroy(); IExecutionContext* context = engine->createExecutionContext(); assert(context != nullptr); float prob[OUTPUT_SIZE]; typedef std::chrono::high_resolution_clock Time; typedef std::chrono::duration<double, std::ratio<1, 1000>> ms; typedef std::chrono::duration<float> fsec; double total = 0.0;
          // run inference and cout time auto t0 = Time::now(); doInference(*context, data, prob, 1); auto t1 = Time::now(); fsec fs = t1 - t0; ms d = std::chrono::duration_cast(fs); total += d.count(); // destroy the engine context->destroy(); engine->destroy(); runtime->destroy(); std::cout << std::endl << "Running time of one image is:" << total << "ms" << std::endl; gLogInfo << "Output:\n"; for (int i = 0; i < OUTPUT_SIZE; i++) { gLogInfo << prob[i] << " "; } gLogInfo << std::endl;
          return gLogger.reportTest(sampleTest, true);}

          其中image.cpp的代碼為:

          #include #include "image.hpp"
          static const float kMean[3] = { 0.485f, 0.456f, 0.406f };static const float kStdDev[3] = { 0.229f, 0.224f, 0.225f };static const int map_[7][3] = { {0,0,0} , {128,0,0}, {0,128,0}, {0,0,128}, {128,128,0}, {128,0,128}, {0,128,0}};

          float* normal(cv::Mat img) { //cv::Mat image(img.rows, img.cols, CV_32FC3); float * data; data = (float*)calloc(img.rows*img.cols * 3, sizeof(float));
          for (int c = 0; c < 3; ++c) { for (int i = 0; i < img.rows; ++i) { //獲取第i行首像素指針 cv::Vec3b *p1 = img.ptr(i); //cv::Vec3b *p2 = image.ptr(i); for (int j = 0; j < img.cols; ++j) { data[c * img.cols * img.rows + i * img.cols + j] = (p1[j][c] / 255.0f - kMean[c]) / kStdDev[c]; } } } return data;}

          image.hpp的內(nèi)容為:

          #pragma oncetypedef struct {  int w;  int h;  int c;  float *data;} image;float* normal(cv::Mat img);

          運(yùn)行結(jié)果為:

          同樣的test.jpg在python環(huán)境下的運(yùn)行結(jié)果為:

          可以發(fā)現(xiàn),c++環(huán)境下resnet50輸出的(1,1000)的特征與python環(huán)境下feat1(TensorRT)和feat2(pytorch)的結(jié)果差距很小。

          上面的是將pytorch首先轉(zhuǎn)化為onnx,然后讓TensorRT解析onnx從而構(gòu)建TensorRT引擎。那么我們?nèi)绾巫孴ensorRT直接加載引擎文件呢,也就是說,我們先把onnx轉(zhuǎn)化為TensorRT的trt文件,然后讓c++環(huán)境下的TensorRT直接加載trt文件,從而構(gòu)建engine。

          在這里我們首先使用onnx-tensorrt這個(gè)項(xiàng)目來使resnet50.onnx轉(zhuǎn)化為resnet50.trt。采用的項(xiàng)目是https://github.com/onnx/onnx-tensorrt這個(gè)項(xiàng)目的安裝也不難。按要求安裝好protobuf就可以。安裝成功的結(jié)果如下:

          運(yùn)行如下命令,就可以獲得rensnet50.trt這個(gè)引擎文件

          onnx2trt resnet50.onnx -o resnet50.trt

          需要注意的是,onnx-tensort這個(gè)項(xiàng)目在編譯的時(shí)有一個(gè)指定GPU計(jì)算能力的選項(xiàng),如下圖所示:

          https://developer.nvidia.com/cuda-gpus可以查看不同顯卡的計(jì)算能力,比如你用7.5計(jì)算力生成的trt文件,是不能用6.5的顯卡來解析的。

          另外在onnx2trt命令有個(gè)-b操作,是指定生成的trt文件的batch size的。在實(shí)際test過程中,你的batch size是多少,這個(gè)就設(shè)置成多少。我記得我當(dāng)時(shí)trt文件的batch size是1,但是我實(shí)際的batch size是8,運(yùn)行后,只有一張圖片有結(jié)果,其他7張圖片都是0。

          如果能順利生成trt文件的話,在代碼中可以直接添加以下函數(shù),來生成engine, 其他就不需要改變。

          bool read_TRT_File(const std::string& engineFile, IHostMemory*& trtModelStream){     std::fstream file;     std::cout << "loading filename from:" << engineFile << std::endl;     nvinfer1::IRuntime* trtRuntime;     //nvonnxparser::IPluginFactory* onnxPlugin = createPluginFactory(gLogger.getTRTLogger());     file.open(engineFile, std::ios::binary | std::ios::in);     file.seekg(0, std::ios::end);     int length = file.tellg();     std::cout << "length:" << length << std::endl;     file.seekg(0, std::ios::beg);     std::unique_ptr<char[]> data(new char[length]);     file.read(data.get(), length);     file.close();     std::cout << "load engine done" << std::endl;     std::cout << "deserializing" << std::endl;     trtRuntime = createInferRuntime(gLogger.getTRTLogger());     //ICudaEngine* engine = trtRuntime->deserializeCudaEngine(data.get(), length, onnxPlugin);     ICudaEngine* engine = trtRuntime->deserializeCudaEngine(data.get(), length, nullptr);     std::cout << "deserialize done" << std::endl;     assert(engine != nullptr);     std::cout << "The engine in TensorRT.cpp is not nullptr" <<std::endl;     trtModelStream = engine->serialize();     return true; }

          如果想保存引擎文件的話,可以在自己的代碼中添加這幾句話,就可以生成trt文件,然后下次直接調(diào)用trt文件。

            nvinfer1::IHostMemory* data = engine->serialize();  std::ofstream file;  file.open(filename, std::ios::binary | std::ios::out);  cout << "writing engine file..." << endl;  file.write((const char*)data->data(), data->size());  cout << "save engine file done" << endl;  file.close();

          五.總結(jié)

          TensorRT的部署并不難,難的是模型轉(zhuǎn)化,在這個(gè)過程中有太多的操作是TensorRT不支持的,或者pytorch模型轉(zhuǎn)化成的onnx本身就有問題。經(jīng)常會(huì)出現(xiàn),expand, Gather, reshape不支持等。感覺TensorRT對(duì)pytorch的維度變化特別不友好,我自己在模型轉(zhuǎn)化過程中絕大多數(shù)bug都出在維度變化上。如果你有什么問題的話,請(qǐng)?jiān)谙路搅粞园桑『冒桑瑫簳r(shí)就先寫這么多,以后再補(bǔ)充吧。

          最近TensorRT7出來了,支持了挺多5版本,6版本無法支持的操作。發(fā)現(xiàn)了一個(gè)tiny-tensorrt,貌似在C++和python環(huán)境下部署很easy,暫時(shí)還沒測(cè)試過,但是先記錄一下。https://github.com/zerollzeng/tiny-tensorrt


          推薦閱讀



          添加極市小助手微信(ID : cvmart2),備注:姓名-學(xué)校/公司-研究方向-城市(如:小極-北大-目標(biāo)檢測(cè)-深圳),即可申請(qǐng)加入極市目標(biāo)檢測(cè)/圖像分割/工業(yè)檢測(cè)/人臉/醫(yī)學(xué)影像/3D/SLAM/自動(dòng)駕駛/超分辨率/姿態(tài)估計(jì)/ReID/GAN/圖像增強(qiáng)/OCR/視頻理解等技術(shù)交流群:月大咖直播分享、真實(shí)項(xiàng)目需求對(duì)接、求職內(nèi)推、算法競(jìng)賽、干貨資訊匯總、與?10000+來自港科大、北大、清華、中科院、CMU、騰訊、百度等名校名企視覺開發(fā)者互動(dòng)交流~

          △長(zhǎng)按添加極市小助手

          △長(zhǎng)按關(guān)注極市平臺(tái),獲取最新CV干貨

          覺得有用麻煩給個(gè)在看啦~??
          瀏覽 60
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  午夜综合网 | 凹凸在线视频 | 色色五月天网站 | 黄色三级黄色毛片 | 成人91av视频在线看 |