【機(jī)器學(xué)習(xí)】異常檢測算法之(HBOS)-Histogram-based Outlier Score
HBOS全名為:Histogram-based Outlier Score。它是一種單變量方法的組合,不能對特征之間的依賴關(guān)系進(jìn)行建模,但是計算速度較快,對大數(shù)據(jù)集友好,其基本假設(shè)是數(shù)據(jù)集的每個維度相互獨(dú)立,然后對每個維度進(jìn)行區(qū)間(bin)劃分,區(qū)間的密度越高,異常評分越低。理解了這句話,基本就理解了這個算法。下面我專門畫了兩個圖來解釋這句話。
1、靜態(tài)寬度直方圖


2、動態(tài)寬度直方圖

二、算法推導(dǎo)過程





PyOD是一個可擴(kuò)展的Python工具包,用于檢測多變量數(shù)據(jù)中的異常值。它可以在一個詳細(xì)記錄API下訪問大約20個離群值檢測算法。
三、應(yīng)用案例詳解
1、基本用法
from pyod.models.hbosHBOSHBOS(n_bins=10,alpha=0.1,tol=0.5,contamination=0.1)
2、模型參數(shù)
#導(dǎo)入包from pyod.utils.data import generate_data,evaluate_print# 樣本的生成X_train, y_train, X_test, y_test = generate_data(n_train=200, n_test=100, contamination=0.1)X_train.shape(200, 2)X_test.shape(100, 2)from pyod.models import hbosfrom pyod.utils.example import visualize# 模型訓(xùn)練clf = hbos.HBOS()clf.fit(X_train)y_train_pred = clf.labels_y_train_socres = clf.decision_scores_#返回未知數(shù)據(jù)上的分類標(biāo)簽 (0: 正常值, 1: 異常值)y_test_pred = clf.predict(X_test)# 返回未知數(shù)據(jù)上的異常值 (分值越大越異常)y_test_scores = clf.decision_function(X_test)print(y_test_pred)array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])print(y_test_scores)array([1.94607743, 1.94607743, 1.94607743, 3.18758465, 2.99449223,1.94607743, 3.18758465, 2.99449223, 1.94607743, 3.18758465,1.94607743, 1.94607743, 3.18758465, 1.94607743, 1.94607743,1.94607743, 3.18758465, 1.94607743, 2.99449223, 1.94607743,1.94607743, 1.94607743, 1.94607743, 3.18758465, 3.18758465,2.99449223, 1.94607743, 1.94607743, 1.94607743, 3.18758465,1.94607743, 2.99449223, 1.94607743, 1.94607743, 1.94607743,1.94607743, 2.99449223, 1.94607743, 1.94607743, 1.94607743,1.94607743, 1.94607743, 3.18758465, 1.94607743, 1.94607743,2.99449223, 2.99449223, 3.18758465, 2.99449223, 1.94607743,1.94607743, 1.94607743, 1.94607743, 1.94607743, 3.18758465,1.94607743, 3.18758465, 3.18758465, 1.94607743, 1.94607743,1.94607743, 2.99449223, 3.18758465, 2.99449223, 1.94607743,1.94607743, 3.18758465, 1.94607743, 1.94607743, 1.94607743,1.94607743, 1.94607743, 1.94607743, 2.99449223, 1.94607743,2.99449223, 1.94607743, 3.18758465, 3.18758465, 1.94607743,2.99449223, 2.99449223, 1.94607743, 1.94607743, 1.94607743,1.94607743, 2.99449223, 1.94607743, 3.18758465, 1.94607743,6.36222028, 6.47923046, 6.5608128 , 6.52101746, 6.36222028,6.52015473, 6.44010653, 5.30002108, 6.47923046, 6.51944504])# 模型評估clf_name = 'HBOS'evaluate_print(clf_name, y_test, y_test_scores)HBOS ROC:1.0, precision @ rank n:1.0# 模型可視化visualize(clf_name,X_train, y_train,X_test, y_test,y_train_pred,y_test_pred,show_figure=True,save_figure=False)

四、總?? 結(jié)
往期精彩回顧
評論
圖片
表情
