18禁日韩,亚洲中文字幕在线看,九一免费网站,欧美大屌啪啪,日本黄色一区,美国三级欧美一级,亚洲AⅤ无码一区二区波多野按摩,色老板最新凹凸视频

來(lái)源：DeepHub IMBA
本文約2700字，建議閱讀5分鐘
在本文中，我將討論和解釋其中的一些方法，并給出使用 Python 代碼的示例。

在評(píng)估模型時(shí)，雖然準(zhǔn)確性是訓(xùn)練階段模型評(píng)估和應(yīng)用模型調(diào)整的重要指標(biāo)，但它并不是模型評(píng)估的最佳指標(biāo)，我們可以使用幾個(gè)評(píng)估指標(biāo)來(lái)評(píng)估我們的模型。

因?yàn)槲覀冇糜跇?gòu)建大多數(shù)模型的數(shù)據(jù)是不平衡的，并且在對(duì)數(shù)據(jù)進(jìn)行訓(xùn)練時(shí)模型可能會(huì)過(guò)擬合。在本文中，我將討論和解釋其中的一些方法，并給出使用 Python 代碼的示例。

混淆矩陣

對(duì)于分類(lèi)模型使用混淆矩陣是一個(gè)非常好的方法來(lái)評(píng)估我們的模型。它對(duì)于可視化的理解預(yù)測(cè)結(jié)果是非常有用的，因?yàn)檎拓?fù)的測(cè)試樣本的數(shù)量都會(huì)顯示出來(lái)。并且它提供了有關(guān)模型如何解釋預(yù)測(cè)的信息?；煜仃嚳捎糜诙投囗?xiàng)分類(lèi)。它由四個(gè)矩陣組成：

#Import Libraries:
from random import random
from random import randint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import roc_curve

#Fabricating variables:
#Creating values for FeNO with 3 classes:
FeNO_0 = np.random.normal(15,20, 1000)
FeNO_1 = np.random.normal(35,20, 1000)
FeNO_2 = np.random.normal(65, 20, 1000)

#Creating values for FEV1 with 3 classes:
FEV1_0 = np.random.normal(4.50, 1, 1000)
FEV1_1 = np.random.uniform(3.75, 1.2, 1000)
FEV1_2 = np.random.uniform(2.35, 1.2, 1000)

#Creating values for Bronco Dilation with 3 classes:
BD_0 = np.random.normal(150,49, 1000)
BD_1 = np.random.uniform(250,50,1000)
BD_2 = np.random.uniform(350, 50, 1000)

#Creating labels variable with two classes (1)Disease (0)No disease:
no_disease = np.zeros((1500,), dtype=int)
disease = np.ones((1500,), dtype=int)

#Concatenate classes into one variable:
FeNO = np.concatenate([FeNO_0, FeNO_1, FeNO_2])
FEV1 = np.concatenate([FEV1_0, FEV1_1, FEV1_2])
BD = np.concatenate([BD_0, BD_1, BD_2])
dx = np.concatenate([not_asma, asma])

#Create DataFrame:
df = pd.DataFrame()#Add variables to DataFrame:
df['FeNO'] = FeNO.tolist()
df['FEV1'] = FEV1.tolist()
df['BD'] = BD.tolist()
df['dx'] = dx.tolist()

#Create X and y:
X = df.drop('dx', axis=1)
y = df['dx']#Train and Test split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

#Build the model:
logisticregression = LogisticRegression().fit(X_train, y_train)

#Print accuracy metrics:
print("training set score: %f" % logisticregression.score(X_train, y_train))
print("test set score: %f" % logisticregression.score(X_test, y_test))

現(xiàn)在我們可以構(gòu)建混淆矩陣并檢查我們的模型了:

# Predicting labels from X_test data
y_pred = logisticregression.predict(X_test)

# Create the confusion matrix
confmx = confusion_matrix(y_test, y_pred)
f, ax = plt.subplots(figsize = (8,8))
sns.heatmap(confmx, annot=True, fmt='.1f', ax = ax)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show();

可以看到，模型未能對(duì)42個(gè)標(biāo)簽[1]和57個(gè)標(biāo)簽[0]的進(jìn)行分類(lèi)。

上面的方法是二分類(lèi)的情況，建立多分類(lèi)的混淆矩陣的步驟是相似的。

#Fabricating variables:
#Creating values for FeNO with 3 classes:
FeNO_0 = np.random.normal(15,20, 1000)
FeNO_1 = np.random.normal(35,20, 1000)
FeNO_2 = np.random.normal(65, 20, 1000)

#Creating values for FEV1 with 3 classes:
FEV1_0 = np.random.normal(4.50, 1, 1000)
FEV1_1 = np.random.normal(3.75, 1.2, 1000)
FEV1_2 = np.random.normal(2.35, 1.2, 1000)

#Creating values for Broncho Dilation with 3 classes:
BD_0 = np.random.normal(150,49, 1000)
BD_1 = np.random.normal(250,50,1000)
BD_2 = np.random.normal(350, 50, 1000)

#Creating labels variable with three classes:
no_disease = np.zeros((1000,), dtype=int)
possible_disease = np.ones((1000,), dtype=int)
disease = np.full((1000,), 2, dtype=int)

#Concatenate classes into one variable:
FeNO = np.concatenate([FeNO_0, FeNO_1, FeNO_2])
FEV1 = np.concatenate([FEV1_0, FEV1_1, FEV1_2])
BD = np.concatenate([BD_0, BD_1, BD_2])
dx = np.concatenate([no_disease, possible_disease, disease])

#Create DataFrame:
df = pd.DataFrame()

#Add variables to DataFrame:
df['FeNO'] = FeNO.tolist()
df['FEV1'] = FEV1.tolist()
df['BD'] = BD.tolist()
df['dx'] = dx.tolist()

#Creating X and y:
X = df.drop('dx', axis=1)
y = df['dx']#Data split into train and test:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)#Fit Logistic Regression model:
logisticregression = LogisticRegression().fit(X_train, y_train)

#Evaluate Logistic Regression model:
print("training set score: %f" % logisticregression.score(X_train, y_train))
print("test set score: %f" % logisticregression.score(X_test, y_test))

現(xiàn)在我們來(lái)創(chuàng)建混淆矩陣：

# Predicting labels from X_test data
y_pred = logisticregression.predict(X_test)

# Create the confusion matrix
confmx = confusion_matrix(y_test, y_pred)
f, ax = plt.subplots(figsize = (8,8))
sns.heatmap(confmx, annot=True, fmt='.1f', ax = ax)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show();

通過(guò)觀(guān)察混淆矩陣，我們可以看到標(biāo)簽[1]的錯(cuò)誤率更高，因此是最難分類(lèi)的。

評(píng)價(jià)指標(biāo)

在機(jī)器學(xué)習(xí)中，有許多不同的指標(biāo)用于評(píng)估分類(lèi)器的性能。最常用的是:

準(zhǔn)確性Accuracy:我們的模型在預(yù)測(cè)結(jié)果方面有多好。此指標(biāo)用于度量模型輸出與目標(biāo)結(jié)果的接近程度（所有樣本預(yù)測(cè)正確的比例）。
精度Precision:我們預(yù)測(cè)的正樣本有多少是正確的?查準(zhǔn)率（預(yù)測(cè)為正樣本中，有多少實(shí)際為正樣本，預(yù)測(cè)的正樣本有多少是對(duì)的）。
召回Recall:我們的樣本中有多少是目標(biāo)標(biāo)簽?查全率（有多少正樣本被預(yù)測(cè)了，所有正樣本中能預(yù)測(cè)對(duì)的有多少）。
F1 Score:是查準(zhǔn)率和查全率的加權(quán)平均值。

我們還是使用前面示例中構(gòu)建的數(shù)據(jù)和模型來(lái)構(gòu)建混淆矩陣。使用sklearn打印所需模型的評(píng)估指標(biāo)是非常簡(jiǎn)單的，所以我們這里直接使用現(xiàn)有的函數(shù)classification_report：

# Printing the model scores:
print(classification_report(y_test, y_pred))

可以看到，標(biāo)簽 [0] 的精度更高，標(biāo)簽 [1] 的 f1 分?jǐn)?shù)更高。在二分類(lèi)的混淆矩陣中，我們看到了標(biāo)簽 [1] 的錯(cuò)誤分類(lèi)數(shù)據(jù)較少。

對(duì)于多標(biāo)簽分類(lèi)

# Printing the model scores:
print(classification_report(y_test, y_pred))

通過(guò)混淆矩陣，可以看到標(biāo)簽 [1] 是最難分類(lèi)的，標(biāo)簽 [1] 的準(zhǔn)確率、召回率和 f1 分?jǐn)?shù)也是一樣的。

ROC和AUC

ROC 曲線(xiàn)，是一種圖形表示，它說(shuō)明了二元分類(lèi)器系統(tǒng)在其判別閾值變化時(shí)的性能。ROC 曲線(xiàn)下的面積通常用于衡量測(cè)試的有用性，其中更大的面積意味著更有用的測(cè)試。ROC 曲線(xiàn)顯示了假陽(yáng)性率 (FPR) 與真陽(yáng)性率 (TPR) 的對(duì)比。

#Get the values of FPR and TPR:
fpr, tpr, thresholds = roc_curve(y_test,logisticregression.decision_function(X_test))
plt.xlabel("FPR")
plt.ylabel("TPR (recall)")
plt.title("roc_curve");

# find threshold closest to zero:
close_zero = np.argmin(np.abs(thresholds))
plt.plot(fpr[close_zero], tpr[close_zero], 'o', markersize=10,
label="threshold zero", fillstyle="none", c='k', mew=2)
plt.legend(loc=4)

PR(precision recall )曲線(xiàn)

在P-R曲線(xiàn)中，Precision為橫坐標(biāo)，Recall為縱坐標(biāo)。在ROC曲線(xiàn)中曲線(xiàn)越凸向左上角越好，在P-R曲線(xiàn)中，曲線(xiàn)越凸向右上角越好。P-R曲線(xiàn)判斷模型的好壞要根據(jù)具體情況具體分析，有的項(xiàng)目要求召回率較高、有的項(xiàng)目要求精確率較高。P-R曲線(xiàn)的繪制跟ROC曲線(xiàn)的繪制是一樣的，在不同的閾值下得到不同的Precision、Recall，得到一系列的點(diǎn)，將它們?cè)赑-R圖中繪制出來(lái)，并依次連接起來(lái)就得到了P-R圖。

PR 曲線(xiàn)只是一個(gè)圖形，y 軸上有 Precision 值，x 軸上有 Recall 值。換句話(huà)說(shuō)，PR 曲線(xiàn)在 y 軸上包含 TP/(TP+FN)，在 x 軸上包含 TP/(TP+FP)。

ROC 曲線(xiàn)是包含 x 軸上的 Recall = TPR = TP/(TP+FN) 和 y 軸上的 FPR = FP/(FP+TN) 的圖。ROC曲線(xiàn)并且不會(huì)現(xiàn)實(shí)假陽(yáng)性率與假陰性率，而是繪制真陽(yáng)性率與假陽(yáng)性率。

PR 曲線(xiàn)通常在涉及信息檢索的問(wèn)題中更為常見(jiàn)，不同場(chǎng)景對(duì)ROC和PRC偏好不一樣，要根據(jù)實(shí)際情況區(qū)別對(duì)待。

#Get precision and recall thresholds:
precision, recall, thresholds = precision_recall_curve(y_test,logisticregression.decision_function(X_test))

# find threshold closest to zero:
close_zero = np.argmin(np.abs(thresholds))

#Plot curve:
plt.plot(precision[close_zero],    
        recall[close_zero],
        'o',
        markersize=10,
        label="threshold zero",
        fillstyle="none",
        c='k',
        mew=2)
plt.plot(precision, recall, label="precision recall curve")
plt.xlabel("precision")
plt.ylabel("recall")
plt.title("precision_recall_curve");
plt.legend(loc="best")

編輯：王菁
校對(duì)：林亦霖

評(píng)估和選擇最佳學(xué)習(xí)模型的一些指標(biāo)總結(jié)

混淆矩陣

評(píng)價(jià)指標(biāo)

ROC和AUC

PR(precision recall )曲線(xiàn)