作者：王浩，3D視覺開發(fā)者社區(qū)簽約作者，畢業(yè)于北京航空航天大學(xué)，人工智能領(lǐng)域優(yōu)質(zhì)創(chuàng)作者，CSDN博客認(rèn)證專家。

編輯：3D視覺開發(fā)者社區(qū)

摘要

YOLOX: Exceeding YOLO Series in 2021

?代碼：https://github.com/Megvii-BaseDetection/YOLOX?論文：https://arxiv.org/abs/2107.08430

YOLOX 是曠視開源的高性能檢測器。曠視的研究者將解耦頭、數(shù)據(jù)增強、無錨點以及標(biāo)簽分類等目標(biāo)檢測領(lǐng)域的優(yōu)秀進(jìn)展與 YOLO 進(jìn)行了巧妙的集成組合，提出了 YOLOX，不僅實現(xiàn)了超越 YOLOv3、YOLOv4 和 YOLOv5 的 AP，而且取得了極具競爭力的推理速度。如下圖：

一、配置環(huán)境

本機(jī)的環(huán)境：

操作系統(tǒng)	Win10
Pytorch版本	1.8.0
Cuda版本	11.1

1.1 下載源碼

GitHub地址：https://github.com/Megvii-BaseDetection/YOLOX，下載完成后放到D盤根目錄，然后用PyCharm打開。

1.2 安裝依賴包

點擊“Terminal”,如下圖，

然后執(zhí)行下面的命令，安裝所有的依賴包。

pip install -r requirements.txt

1.3 安裝yolox

python setup.py install

看到如下信息，則說明安裝完成了

1.4 安裝apex

APEX是英偉達(dá)開源的，完美支持PyTorch框架，用于改變數(shù)據(jù)格式來減小模型顯存占用的工具。其中最有價值的是amp（Automatic Mixed Precision），將模型的大部分操作都用Float16數(shù)據(jù)類型測試，一些特別操作仍然使用Float32。并且用戶僅僅通過三行代碼即可完美將自己的訓(xùn)練代碼遷移到該模型。實驗證明，使用Float16作為大部分操作的數(shù)據(jù)類型，并沒有降低參數(shù)，在一些實驗中，反而由于可以增大Batch size，帶來精度上的提升，以及訓(xùn)練速度上的提升。

安裝步驟：

1) 到官網(wǎng)下載apex，地址：mirrors / nvidia / apex · CODE CHINA (csdn.net)^[1]

2) 下載完成后，解壓后，在Shell里，進(jìn)入到apex-master中。

3) 執(zhí)行安裝命令

   pip install -r requirements.txt   python setup.py install

看到如下log，則表明安裝成功。

1.5 安裝pycocotools

 pip install pycocotools

注：如果出現(xiàn)環(huán)境問題，可以參考博客：https://blog.csdn.net/hhhhhhhhhhwwwwwwwwww/article/details/105858384

1.6 驗證環(huán)境

下載預(yù)訓(xùn)練模型，本文選用的是YOLOX-s，下載地址：https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth。

下載完成后，將預(yù)訓(xùn)練模型放到工程的根目錄，如下圖：

然后驗證環(huán)境，執(zhí)行：

python tools/demo.py image -f exps/default/yolox_s.py -c ./yolox_s.pth --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result --device gpu

參數(shù)說明

參數(shù)	說明
-c	權(quán)重的路徑
-path	測試圖片的路徑
-conf	置信度閾值
-nms	nms的IOU閾值
-tsize	測試圖片resize的大小
-save_result	是否保存推理結(jié)果
--device	選用gpu或cpu推理

查看運行結(jié)果：

看到上圖說明環(huán)境沒有問題了。

二、制作數(shù)據(jù)集

數(shù)據(jù)集我們采用VOC數(shù)據(jù)集，原始數(shù)據(jù)集是Labelme標(biāo)注的數(shù)據(jù)集。下載地址：https://pan.baidu.com/s/1kj-diqEK2VNVqd2n4ROa5g （提取碼rrnz）

新建labelme2voc.py文件

import osfrom typing import List, Anyimport numpy as npimport codecsimport jsonfrom glob import globimport cv2import shutilfrom sklearn.model_selection import train_test_split
# 1.標(biāo)簽路徑
labelme_path = "LabelmeData/"  # 原始labelme標(biāo)注數(shù)據(jù)路徑saved_path = "VOC2007/"  # 保存路徑isUseTest = True  # 是否創(chuàng)建test集# 2.創(chuàng)建要求文件夾if not os.path.exists(saved_path + "Annotations"):    os.makedirs(saved_path + "Annotations")if not os.path.exists(saved_path + "JPEGImages/"):    os.makedirs(saved_path + "JPEGImages/")if not os.path.exists(saved_path + "ImageSets/Main/"):    os.makedirs(saved_path + "ImageSets/Main/")# 3.獲取待處理文件files = glob(labelme_path + "*.json")files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]print(files)# 4.讀取標(biāo)注信息并寫入 xmlfor json_file_ in files:    json_filename = labelme_path + json_file_ + ".json"    json_file = json.load(open(json_filename, "r", encoding="utf-8"))    height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shapewith codecs.open(saved_path + "Annotations/" + json_file_ + ".xml", "w", "utf-8") as xml:        xml.write('<annotation>\n')        xml.write('\t<folder>' + 'WH_data' + '</folder>\n')        xml.write('\t<filename>' + json_file_ + ".jpg" + '</filename>\n')        xml.write('\t<source>\n')        xml.write('\t\t<database>WH Data</database>\n')        xml.write('\t\t<annotation>WH</annotation>\n')        xml.write('\t\t<image>flickr</image>\n')        xml.write('\t\t<flickrid>NULL</flickrid>\n')        xml.write('\t</source>\n')        xml.write('\t<owner>\n')        xml.write('\t\t<flickrid>NULL</flickrid>\n')        xml.write('\t\t<name>WH</name>\n')        xml.write('\t</owner>\n')        xml.write('\t<size>\n')        xml.write('\t\t<width>' + str(width) + '</width>\n')        xml.write('\t\t<height>' + str(height) + '</height>\n')        xml.write('\t\t<depth>' + str(channels) + '</depth>\n')        xml.write('\t</size>\n')        xml.write('\t\t<segmented>0</segmented>\n')for multi in json_file["shapes"]:            points = np.array(multi["points"])            labelName = multi["label"]            xmin = min(points[:, 0])            xmax = max(points[:, 0])            ymin = min(points[:, 1])            ymax = max(points[:, 1])            label = multi["label"]if xmax <= xmin:passelif ymax <= ymin:passelse:                xml.write('\t<object>\n')                xml.write('\t\t<name>' + labelName + '</name>\n')                xml.write('\t\t<pose>Unspecified</pose>\n')                xml.write('\t\t<truncated>1</truncated>\n')                xml.write('\t\t<difficult>0</difficult>\n')                xml.write('\t\t<bndbox>\n')                xml.write('\t\t\t<xmin>' + str(int(xmin)) + '</xmin>\n')                xml.write('\t\t\t<ymin>' + str(int(ymin)) + '</ymin>\n')                xml.write('\t\t\t<xmax>' + str(int(xmax)) + '</xmax>\n')                xml.write('\t\t\t<ymax>' + str(int(ymax)) + '</ymax>\n')                xml.write('\t\t</bndbox>\n')                xml.write('\t</object>\n')print(json_filename, xmin, ymin, xmax, ymax, label)        xml.write('</annotation>')# 5.復(fù)制圖片到 VOC2007/JPEGImages/下image_files = glob(labelme_path + "*.jpg")print("copy image files to VOC007/JPEGImages/")for image in image_files:    shutil.copy(image, saved_path + "JPEGImages/")# 6.split files for txttxtsavepath = saved_path + "ImageSets/Main/"ftrainval = open(txtsavepath + '/trainval.txt', 'w')ftest = open(txtsavepath + '/test.txt', 'w')ftrain = open(txtsavepath + '/train.txt', 'w')fval = open(txtsavepath + '/val.txt', 'w')total_files = glob("./VOC2007/Annotations/*.xml")total_files = [i.replace("\\", "/").split("/")[-1].split(".xml")[0] for i in total_files]trainval_files = []test_files = []if isUseTest:    trainval_files, test_files = train_test_split(total_files, test_size=0.15, random_state=55)else:    trainval_files = total_filesfor file in trainval_files:    ftrainval.write(file + "\n")# splittrain_files, val_files = train_test_split(trainval_files, test_size=0.15, random_state=55)# trainfor file in train_files:    ftrain.write(file + "\n")# valfor file in val_files:    fval.write(file + "\n")for file in test_files:print(file)    ftest.write(file + "\n")ftrainval.close()ftrain.close()fval.close()ftest.close()

運行上面的代碼就可以得到VOC2007數(shù)據(jù)集。如下圖所示：

VOC的目錄如下，所以要新建data/VOCdevkit目錄，然后將上面的結(jié)果復(fù)制進(jìn)去

├── data │   ├── VOCdevkit│   │   ├── VOC2007│   │   │   ├── Annotations #xml文件│   │   │   ├── JPEGImages #圖片│   │   │   ├── ImageSets│   │   │   │   ├── Main│   │   │   │   │   ├── test.txt │   │   │   │   │   ├── trainval.txt

到這里，數(shù)據(jù)集制作完成。

三、修改數(shù)據(jù)配置文件

3.1 修改類別

文件路徑：exps/example/yolox_voc/yolox_voc_s.py，本次使用的類別有2類，所以將num_classes修改為2。

打開yolox/data/datasets/voc_classes.py文件，修改為自己的類別名：

3.2 修改數(shù)據(jù)集目錄

文件路徑：exps/example/yolox_voc/yolox_voc_s.py，data_dir修改為“./data/VOCdevkit”，image_sets刪除2012的，最終結(jié)果如下：

打開yolox/data/datasets/voc.py,這里面有個錯誤。畫框位置，將大括號的“%s”去掉，否則驗證的時候一直報找不到文件的錯誤。

修改完成后，執(zhí)行

python setup.py install

重新編譯yolox。

四、訓(xùn)練

推薦使用命令行的方式訓(xùn)練。

執(zhí)行命令：

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 --fp16  -c yolox_s.pth

就可以開始訓(xùn)練了。如果不喜歡使用命令行的方式，想直接運行train.py，那就需要就如train.py修改參數(shù)了。首先把train.py從tools里面復(fù)制一份到工程的根目錄（建議這樣做，否則需要修改的路徑比較多，新手容易犯錯誤），如圖：

打開，修改里面的參數(shù)。需要修改的參數(shù)如下：

   parser.add_argument("-b", "--batch-size", type=int, default=4, help="batch size")    parser.add_argument("-d", "--devices", default=1, type=int, help="device for training")    parser.add_argument("-f","--exp_file",default="exps/example/yolox_voc/yolox_voc_s.py",        type=str,        help="plz input your expriment description file",)    parser.add_argument("-c", "--ckpt", default='yolox_s.pth', type=str, help="checkpoint file")    parser.add_argument("--fp16",        dest="fp16",default=True,        action="store_true",        help="Adopting mix precision training.",)

按照上面的參數(shù)配置就可以運行了，如下圖：

如果訓(xùn)練了一段時間，再想接著以前的模型再訓(xùn)練，應(yīng)該如何做呢？修改train.py的參數(shù)即可，需要修改的參數(shù)如下：

   parser.add_argument("--resume", default=True, action="store_true", help="resume training")  parser.add_argument("-c", "--ckpt", default='YOLOX_outputs/yolox_voc_s/best_ckpt.pth', type=str, help="checkpoint file")    parser.add_argument("-e","--start_epoch",default=100,        type=int,        help="resume training start epoch",)

命令行：

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 -c YOLOX_outputs/yolox_voc_s/latest_ckpt.pth.tar -resume -start_epoch=100

再次訓(xùn)練，你發(fā)現(xiàn)epoch不是從0開始了。

五、測試

修改yolox/data/datasets/\__init__.py，導(dǎo)入“VOC_CLASSES”，如下圖：

修改tools/demo.py中代碼，將“COCO_CLASSES”，改為“VOC_CLASSES”。

將“295”行的Predictor類初始化傳入的“COCO_CLASSES”改為“VOC_CLASSES”，如下圖：

使用訓(xùn)練好的模型進(jìn)行測試。測試調(diào)用tools/demo.py,先用命令行的方式演示：

python tools/demo.py image -f exps/example/yolox_voc/yolox_voc_s.py -c YOLOX_outputs/yolox_voc_s/latest_ckpt.pth --path ./assets/aircraft_107.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result --device gpu

運行結(jié)果：

如果不想使用命令行，將demo.py復(fù)制一份放到工程的根目錄，然后修改里面的參數(shù)。

parser = argparse.ArgumentParser("YOLOX Demo!")    parser.add_argument("-do","--demo", default="image", help="demo type, eg. image, video and webcam")    parser.add_argument("-expn", "--experiment-name", type=str, default=None)    parser.add_argument("-n", "--name", type=str, default=None, help="model name")    parser.add_argument("--path", default="./assets/aircraft_107.jpg", help="path to images or video")# exp file    parser.add_argument("-f","--exp_file",default="exps/example/yolox_voc/yolox_voc_s.py",        type=str,        help="pls input your expriment description file",)    parser.add_argument("-c", "--ckpt", default="YOLOX_outputs/yolox_voc_s/best_ckpt.pth", type=str, help="ckpt for eval")    parser.add_argument("--device",default="gpu",        type=str,        help="device to run our model, can either be cpu or gpu",)    parser.add_argument("--conf", default=0.3, type=float, help="test conf")    parser.add_argument("--nms", default=0.45, type=float, help="test nms threshold")    parser.add_argument("--tsize", default=640, type=int, help="test img size")    parser.add_argument("--fp16",        dest="fp16",default=False,        action="store_true",        help="Adopting mix precision evaluating.",)
    parser.add_argument("--fuse",        dest="fuse",default=False,        action="store_true",        help="Fuse conv and bn for testing.",)

然后直接運行demo.py,運行結(jié)果如下圖：

5.2 批量預(yù)測

批量預(yù)測很簡單，將path參數(shù)由文件路徑改為圖片的文件夾路徑就可以。例：

    parser.add_argument("--path", default="./assets", help="path to images or video")

這樣就可以預(yù)測assets文件夾下面所有的圖片了。

六、保存測試結(jié)果

demo.py只有將結(jié)果畫到圖片上，沒有保存結(jié)果，所以要增加這部分的功能。

打開yolox/utils.visualize.py文件，修改vis方法，將結(jié)果返回到上層的方法。

def vis(img, boxes, scores, cls_ids, conf=0.5, class_names=None):    result_list = []for i in range(len(boxes)):        box = boxes[i]        cls_id = int(cls_ids[i])        score = scores[i]if score < conf:continue        x0 = int(box[0])        y0 = int(box[1])        x1 = int(box[2])        y1 = int(box[3])        class_name = class_names[cls_id]        one_line = (str(x0), str(y0), str(x1), str(y1), class_name, str(float(score)))        str_one_line = " ".join(one_line)        result_list.append(str_one_line)        color = (_COLORS[cls_id] * 255).astype(np.uint8).tolist()        text = '{}:{:.1f}%'.format(class_name, score * 100)        txt_color = (0, 0, 0) if np.mean(_COLORS[cls_id]) > 0.5 else (255, 255, 255)        font = cv2.FONT_HERSHEY_SIMPLEX
        txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]        cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
        txt_bk_color = (_COLORS[cls_id] * 255 * 0.7).astype(np.uint8).tolist()        cv2.rectangle(            img,(x0, y0 + 1),(x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])),            txt_bk_color,-1)        cv2.putText(img, text, (x0, y0 + txt_size[1]), font, 0.4, txt_color, thickness=1)
return img, result_list

在demo.py的178行增加獲取結(jié)果，并返回上層方法，如下圖：

然后在182，修改image_demo函數(shù)，增加獲取結(jié)果，保存結(jié)果的邏輯，具體代碼如下：

def image_demo(predictor, vis_folder, path, current_time, save_result):if os.path.isdir(path):        files = get_image_list(path)else:        files = [path]    files.sort()for image_name in files:        outputs, img_info = predictor.inference(image_name)        result_image, result_list = predictor.visual(outputs[0], img_info, predictor.confthre)print(result_list)if save_result:            save_folder = os.path.join(                vis_folder, time.strftime("%Y_%m_%d_%H_%M_%S", current_time))            os.makedirs(save_folder, exist_ok=True)            save_file_name = os.path.join(save_folder, os.path.basename(image_name))            logger.info("Saving detection result in {}".format(save_file_name))            txt_name = os.path.splitext(save_file_name)[0]+".txt"print(txt_name)            f = open(txt_name, "w")for line in result_list:                f.write(str(line) + '\n')            f.close()            cv2.imwrite(save_file_name, result_image)        ch = cv2.waitKey(0)if ch == 27 or ch == ord("q") or ch == ord("Q"):break

然后運行demo.py,就可以將結(jié)果保存到txt中。

遇到的錯誤

1、RuntimeError: DataLoader worker (pid(s) 9368, 12520, 6392, 7384) exited unexpectedly

錯誤原因：torch.utils.data.DataLoader中的num_workers錯誤將num_workers改為0即可，0是默認(rèn)值。num_workers是用來指定開多進(jìn)程的數(shù)量，默認(rèn)值為0，表示不啟用多進(jìn)程。

打開yolox/exp/yolox_base.py,將data_num_workers設(shè)置為0，如下圖：

將num_workers設(shè)置為0，程序報錯，并提示設(shè)置環(huán)境變量KMP_DUPLICATE_LIB_OK=TRUE 那你可以在設(shè)置環(huán)境變量KMP_DUPLICATE_LIB_OK=TRUE 或者使用臨時環(huán)境變量：（在代碼開始處添加這行代碼)

import osos.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

2、RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

執(zhí)行命令

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 --fp16 -o -c yolox_s.pth.tar

報的錯誤，把-“-o”去掉后就正常了。

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 4 --fp16  -c yolox_s.pth.tar

References

[1] mirrors / nvidia / apex · CODE CHINA (csdn.net): https://codechina.csdn.net/mirrors/nvidia/apex?utm_source=csdn_github_accelerator
[2] Win10 安裝pycocotools_AI浩-CSDN博客：https://blog.csdn.net/hhhhhhhhhhwwwwwwwwww/article/details/105858384

版權(quán)聲明：本文為奧比中光3D視覺開發(fā)者社區(qū)特約作者授權(quán)原創(chuàng)發(fā)布，未經(jīng)授權(quán)不得轉(zhuǎn)載，本文僅做學(xué)術(shù)分享，版權(quán)歸原作者所有，若涉及侵權(quán)內(nèi)容請聯(lián)系刪文。

如果您覺得有用的話歡迎點贊，收藏，轉(zhuǎn)發(fā)朋友圈

覺得有用的話給我們“點贊”和“再看”吧

超詳細(xì)！手把手教你使用YOLOX進(jìn)行物體檢測（附數(shù)據(jù)集）

摘要

一、 配置環(huán)境

二、 制作數(shù)據(jù)集

三、 修改數(shù)據(jù)配置文件