【深度學習】使用 YOLOv9+SAM 進行動態(tài)物體檢測和分割
共 14312字,需瀏覽 29分鐘
·
2024-06-20 12:00
YOLOv9+SAM:自定義對象檢測新突破,RF100 Construction-Safety-2數(shù)據(jù)集顯威力!
在本文中,我們大膽嘗試,將領先的YOLOv9算法與SAM(分割注意力模塊)技術完美融合,并在RF100 Construction-Safety-2數(shù)據(jù)集上進行了深入的自定義對象檢測模型測試。這一前沿技術的集成,不僅顯著提升了在不同圖像中檢測和分割對象的精準度和細致度,更為我們的應用場景打開了更為廣闊的大門。
想象一下,從自動駕駛系統(tǒng)的安全性能增強,到醫(yī)學成像中診斷過程的精確改進,這一組合技術的潛力無處不在。它仿佛一雙智能的眼睛,能夠精準捕捉到每一個關鍵細節(jié),為我們的生活和工作帶來前所未有的便利和安全性。
而這一切的背后,正是YOLOv9的高效檢測能力與SAM以零樣本方式分割對象能力的完美結(jié)合。這種強大的組合不僅最大限度地減少了大量再訓練或數(shù)據(jù)注釋的繁瑣工作,更使得它成為一種既實用又易于擴展的解決方案。
在RF100 Construction-Safety-2數(shù)據(jù)集上的表現(xiàn)更是令人矚目。它像一把銳利的劍,精準地切割出每一個目標對象,無論是大小、形狀還是顏色,都能得到完美的呈現(xiàn)。這不僅證明了YOLOv9+SAM技術的強大實力,更為我們未來的研究和應用指明了方向。
綜上所述,YOLOv9+SAM無疑是一種值得我們深入研究和廣泛應用的前沿技術。相信在不久的將來,它將在更多領域展現(xiàn)出強大的應用潛力和價值!
YOLOv9簡介
YOLOv9:實時物體檢測領域的革命性突破
在實時物體檢測領域,YOLOv9無疑是一次重大進步。它憑借出色的效率、準確性和適應性,成為了業(yè)界的翹楚。而這一切,都得益于它結(jié)合了可編程梯度信息(PGI)和廣義高效層聚合網(wǎng)絡(GELAN)兩大創(chuàng)新技術。在MS COCO數(shù)據(jù)集上的卓越表現(xiàn),更是成為了其強大實力的有力證明。
YOLOv9不僅繼承了開源社區(qū)的協(xié)作精神,更在Ultralytics YOLOv5的基礎上進行了深入優(yōu)化。它巧妙地利用信息瓶頸原理和可逆函數(shù),成功解決了深度學習中的信息丟失問題,確保在各層之間能夠保留重要的數(shù)據(jù)。這一創(chuàng)新策略不僅提高了模型的結(jié)構(gòu)效率,更確保了精確的檢測能力,即便是在輕量級模型中,也能保持對細節(jié)的敏銳捕捉。
更值得一提的是,YOLOv9的架構(gòu)經(jīng)過精心設計,有效減少了不必要的參數(shù)和計算需求。這使得它能夠在各種模型大小(從緊湊的YOLOv9-S到更廣泛的YOLOv9-E)中都能實現(xiàn)最佳性能。無論是追求速度的快速檢測,還是追求高精度的細致識別,YOLOv9都能展現(xiàn)出速度與檢測精度之間的和諧平衡。
作為計算機視覺領域的里程碑,YOLOv9不僅樹立了新的標桿,更拓寬了人工智能在物體檢測與分割領域的應用視野。它的出現(xiàn),不僅彰顯了該領域戰(zhàn)略創(chuàng)新與協(xié)同努力的影響,更為我們帶來了更多關于未來智能生活的無限遐想。
我們相信,隨著技術的不斷進步和應用的不斷拓展,YOLOv9將會在未來繼續(xù)發(fā)揮其強大的潛力,為我們帶來更多的驚喜和突破。讓我們拭目以待,共同見證這一領域的蓬勃發(fā)展!
分段任意模型 (SAM)簡介
關于數(shù)據(jù)集
入門
-
環(huán)境設置 -
下載 YOLOv9 和 SAM 的預訓練模型權(quán)重 -
圖像推理 -
可視化與分析 -
獲取檢測結(jié)果 -
使用 SAM 進行分割
環(huán)境設置
GPU 狀態(tài)檢查
安裝 Google Drive
from google.colab import drivedrive.mount('/content/drive')
or
%cd {HOME}/!pip install -q roboflow
from roboflow import Roboflowrf = Roboflow(api_key="YOUR API KEY")project = rf.workspace("roboflow-100").project("construction-safety-gsnvb")dataset = project.version(2).download("yolov7")
設置 YOLOv9
!git clone https://github.com/SkalskiP/yolov9.git%cd yolov9!pip3 install -r requirements.txt -q
顯示當前目錄
HOME變量中以供參考。
import osHOME = os.getcwd()print(HOME)
下載模型權(quán)重
!mkdir -p {HOME}/weights !wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0 .1 /yolov9-c.pt !wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0 .1 /yolov9-e.pt !wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0 .1 /gelan-c.pt !wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0 .1 /gelan-e.pt
下載圖像進行推理
SOURCE_IMAGE_PATH。
!python detect.py --weights {HOME}/weights/gelan-c.pt --conf 0.1 --source /content/drive/MyDrive/data/image9.jpeg --device 0 --save-txt --save-conf
使用自定義數(shù)據(jù)運行檢測
detect.py使用指定的參數(shù)執(zhí)行對圖像進行對象檢測,設置置信度閾值并保存檢測結(jié)果。這將創(chuàng)建一個文本文件,其中包含 class_id、邊界框坐標和置信度分數(shù),我們稍后會用到它。
!mkdir -p {HOME}/data!wget -P {HOME}/data -q /content/drive/MyDrive/data/image9.jpegSOURCE_IMAGE_PATH = f"{HOME}/image9.jpeg"
顯示檢測結(jié)果
import matplotlib.patches as patchesfrom matplotlib import pyplot as pltimport numpy as npimport yaml
with open('/content/yolov9/data/coco.yaml', 'r') as file: coco_data = yaml.safe_load(file) class_names = coco_data['names']
for class_id, bbox, conf in zip(class_ids, bboxes, conf_scores): class_name = class_names[class_id]# print(f'Class ID: {class_id}, Class Name: {class_name}, BBox coordinates: {bbox}')
color_map = {}for class_id in class_ids: color_map[class_id] = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
def show_mask(mask, ax, color): h, w = mask.shape[-2:] mask_image = mask.reshape(h, w, 1) * np.array(color).reshape(1, 1, -1) ax.imshow(mask_image)
def show_box(box, label, conf_score, color, ax): x0, y0 = box[0], box[1] w, h = box[2] - box[0], box[3] - box[1] rect = plt.Rectangle((x0, y0), w, h, edgecolor=color, facecolor='none', lw=2) ax.add_patch(rect)
label_offset = 10
# Construct the label with the class name and confidence score label_text = f'{label} {conf_score:.2f}'
ax.text(x0, y0 - label_offset, label_text, color='black', fontsize=10, va='top', ha='left', bbox=dict(facecolor=color, alpha=0.7, edgecolor='none', boxstyle='square,pad=0.4'))
plt.figure(figsize=(10, 10))ax = plt.gca()plt.imshow(image)
# Display and process each bounding box with the corresponding maskfor class_id, bbox in zip(class_ids, bboxes): class_name = class_names[class_id]# print(f'Class ID: {class_id}, Class Name: {class_name}, BBox coordinates: {bbox}')
color = color_map[class_id] input_box = np.array(bbox)
# Generate the mask for the current bounding box masks, _, _ = predictor.predict( point_coords=None, point_labels=None, box=input_box, multimask_output=False, )
show_mask(masks[0], ax, color=color) show_box(bbox, class_name, conf, color, ax)
# Show the final plotplt.axis('off')plt.show()
!pip install ultralyticsfrom ultralytics import YOLO
安裝 Segment-Anything 模型
!pip 安裝'git+https://github.com/facebookresearch/segment-anything.git'
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
提取檢測結(jié)果和置信度分數(shù)
import cv2
# Specify the path to your imageimage_path = '/content/drive/MyDrive/data/image9.jpeg'
# Read the image to get its dimensionsimage = cv2.imread(image_path)image_height, image_width, _ = image.shape
detections_path = '/content/yolov9/runs/detect/exp/labels/image9.txt'
bboxes = []class_ids = []conf_scores = []
with open(detections_path, 'r') as file:for line in file:components = line.split()class_id = int(components[0])confidence = float(components[5])cx, cy, w, h = [float(x) for x in components[1:5]]
# Convert from normalized [0, 1] to image scalecx *= image_widthcy *= image_heightw *= image_widthh *= image_height
# Convert the center x, y, width, and height to xmin, ymin, xmax, ymaxxmin = cx - w / 2ymin = cy - h / 2xmax = cx + w / 2ymax = cy + h / 2
class_ids.append(class_id)bboxes.append((xmin, ymin, xmax, ymax))conf_scores.append(confidence)
# Display the resultsfor class_id, bbox, conf in zip(class_ids, bboxes, conf_scores):print(f'Class ID: {class_id}, Confidence: {conf:.2f}, BBox coordinates: {bbox}')
初始化 SAM 進行圖像分割
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictorsam_checkpoint = "/content/yolov9/sam_vit_h_4b8939.pth"model_type = "vit_h"sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)predictor = SamPredictor(sam)
加載圖像進行分割
import cv2image = cv2.cvtColor(cv2.imread('/content/drive/MyDrive/data/image9.jpeg'), cv2.COLOR_BGR2RGB)predictor.set_image(image)
結(jié)果可視化
import matplotlib.patches as patchesfrom matplotlib import pyplot as pltimport numpy as npimport yamlwith open('/content/yolov9/data/coco.yaml', 'r') as file: coco_data = yaml.safe_load(file) class_names = coco_data['names']for class_id, bbox, conf in zip(class_ids, bboxes, conf_scores): class_name = class_names[class_id]# print(f'Class ID: {class_id}, Class Name: {class_name}, BBox coordinates: {bbox}')color_map = {}for class_id in class_ids: color_map[class_id] = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)def show_mask(mask, ax, color): h, w = mask.shape[-2:] mask_image = mask.reshape(h, w, 1) * np.array(color).reshape(1, 1, -1) ax.imshow(mask_image)def show_box(box, label, conf_score, color, ax): x0, y0 = box[0], box[1] w, h = box[2] - box[0], box[3] - box[1] rect = plt.Rectangle((x0, y0), w, h, edgecolor=color, facecolor='none', lw=2) ax.add_patch(rect) label_offset = 10# Construct the label with the class name and confidence score label_text = f'{label} {conf_score:.2f}' ax.text(x0, y0 - label_offset, label_text, color='black', fontsize=10, va='top', ha='left', bbox=dict(facecolor=color, alpha=0.7, edgecolor='none', boxstyle='square,pad=0.4'))plt.figure(figsize=(10, 10))ax = plt.gca()plt.imshow(image)# Display and process each bounding box with the corresponding maskfor class_id, bbox in zip(class_ids, bboxes): class_name = class_names[class_id]# print(f'Class ID: {class_id}, Class Name: {class_name}, BBox coordinates: {bbox}') color = color_map[class_id] input_box = np.array(bbox)# Generate the mask for the current bounding box masks, _, _ = predictor.predict( point_coords=None, point_labels=None, box=input_box, multimask_output=False, ) show_mask(masks[0], ax, color=color) show_box(bbox, class_name, conf, color, ax)# Show the final plotplt.axis('off')plt.show()
aggregate_mask = np.zeros(image.shape[:2], dtype=np.uint8)
# Generate and accumulate masks for all bounding boxesfor bbox in bboxes:input_box = np.array(bbox).reshape(1, 4)masks, _, _ = predictor.predict(point_coords=None,point_labels=None,box=input_box,multimask_output=False,)aggregate_mask = np.where(masks[0] > 0.5, 1, aggregate_mask)
# Convert the aggregate segmentation mask to a binary maskbinary_mask = np.where(aggregate_mask == 1, 1, 0)
# Create a white background with the same size as the imagewhite_background = np.ones_like(image) * 255
# Apply the binary mask to the original image# Where the binary mask is 0 (background), use white_background; otherwise, use the original imagenew_image = white_background * (1 - binary_mask[..., np.newaxis]) + image * binary_mask[..., np.newaxis]
# Display the new image with the detections and white backgroundplt.figure(figsize=(10, 10))plt.imshow(new_image.astype(np.uint8))plt.axis('off')plt.show()
往期精彩回顧
交流群
歡迎加入機器學習愛好者微信群一起和同行交流,目前有機器學習交流群、博士群、博士申報交流、CV、NLP等微信群,請掃描下面的微信號加群,備注:”昵稱-學校/公司-研究方向“,例如:”張小明-浙大-CV“。請按照格式備注,否則不予通過。添加成功后會根據(jù)研究方向邀請進入相關微信群。請勿在群內(nèi)發(fā)送廣告,否則會請出群,謝謝理解~(也可以加入機器學習交流qq群772479961)
