国产极品久久久久久久久久 ,无码-ThePorn,欧美操熟女,三级无码久久,无码国产精品96久久久久孕妇,伊人青青在线观看视频,睛唱久久久久久久,91av在线播放视频

點擊上方“小白學(xué)視覺”，選擇加"星標"或“置頂”

          
          
           
           
            
                         
             
              重磅干貨，第一時間送達

在本章中，我們將討論機器學(xué)習(xí)技術(shù)在圖像處理中的應(yīng)用。首先，定義機器學(xué)習(xí)，并學(xué)習(xí)它的兩種算法——監(jiān)督算法和無監(jiān)督算法；其次，討論一些流行的無監(jiān)督機器學(xué)習(xí)技術(shù)的應(yīng)用，如聚類和圖像分割等問題。

我們還將研究監(jiān)督機器學(xué)習(xí)技術(shù)在圖像分類和目標檢測等問題上的應(yīng)用。使用非常流行的scikit-learn庫，以及scikit-image和Python-OpenCV（cv2）來實現(xiàn)用于圖像處理的機器學(xué)習(xí)算法。在本章中，我們將帶領(lǐng)讀者深入了解機器學(xué)習(xí)算法及其解決的問題。

本章主要包括以下內(nèi)容：

監(jiān)督與無監(jiān)督學(xué)習(xí)；

無監(jiān)督機器學(xué)習(xí)——聚類、PCA和特征臉；

監(jiān)督機器學(xué)習(xí)——基于手寫數(shù)字數(shù)據(jù)集的圖像分類；

監(jiān)督機器學(xué)習(xí)——目標檢測。

Part1

監(jiān)督與無監(jiān)督學(xué)習(xí)

機器學(xué)習(xí)算法主要有以下兩種類型。

（1）監(jiān)督學(xué)習(xí)：在這種類型的學(xué)習(xí)中，我們得到輸入數(shù)據(jù)集和正確的標簽，需要學(xué)習(xí)輸入和輸出之間的關(guān)系（作為函數(shù)）。手寫數(shù)字分類問題是監(jiān)督（分類）問題的一個例子。

（2）無監(jiān)督學(xué)習(xí)：在這種類型的學(xué)習(xí)中，很少或根本不知道輸出應(yīng)該是什么樣的。人們可以推導(dǎo)得到數(shù)據(jù)的結(jié)構(gòu)而不必知道變量影響。聚類（也可以看作分割）就是一個很好的例子，在圖像處理技術(shù)中，并不知道哪個像素屬于哪個段。

如果計算機程序在T上的性能正如P所度量的，隨著經(jīng)驗E而提高，那么對于某些任務(wù)T和某些性能度量P，計算機程序被設(shè)計成能夠從經(jīng)驗E中學(xué)習(xí)。

例如，假設(shè)有一組手寫數(shù)字圖像及其標簽（從0到9的數(shù)字），需要編寫一個Python程序，該程序?qū)W習(xí)了圖片和標簽（經(jīng)驗E）之間的關(guān)聯(lián)，然后自動標記一組新的手寫數(shù)字圖像。

在本例中，任務(wù)T是為圖像分配標簽（即對數(shù)字圖像進行分類或標識），程序中能夠正確識別的新圖像的比例為性能P（準確率）。在這種情況下，這個程序可以說是一個學(xué)習(xí)程序。

本章將描述一些可以使用機器學(xué)習(xí)算法（無監(jiān)督或監(jiān)督）解決的圖像處理問題。讀者將從學(xué)習(xí)一些無監(jiān)督機器學(xué)習(xí)技術(shù)在解決圖像處理問題中的應(yīng)用開始。

Part2

無監(jiān)督機器學(xué)習(xí)

本節(jié)將討論一些流行的機器學(xué)習(xí)算法及其在圖像處理中的應(yīng)用。從某些聚類算法及其在顏色量化和圖像分割中的應(yīng)用開始。使用scikit-learn庫實現(xiàn)這些聚類算法。

基于圖像分割與顏色量化的k均值聚類算法

本節(jié)將演示如何對pepper圖像執(zhí)行像素矢量量化（VQ），將顯示圖像所需的顏色數(shù)量從250種減少到4種，同時保持整體外觀質(zhì)量。在本例中，像素在三維空間中表示，使用k均值查找4個顏色簇。

在圖像處理文獻中，碼本是從k均值（簇群中心）獲得的，稱為調(diào)色板。在調(diào)色板中，使用1個字節(jié)最多可尋址256種顏色，而RGB編碼要求每個像素3個字節(jié)。GIF文件格式使用這樣的調(diào)色板。為了進行比較，還使用隨機碼本（隨機選取的顏色）的量化圖像。

在使用k均值聚類算法對圖像進行分割前，加載所需的庫和輸入圖像，如下面的代碼所示：

import numpy as npimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansfrom sklearn.metrics import pairwise_distances_argminfrom skimage.io import imreadfrom sklearn.utils import shufflefrom skimage import img_as_floatfrom time import time
pepper = imread("../images/pepper.jpg")
# Display the original imageplt.figure(1), plt.clf()ax = plt.axes([0, 0, 1, 1])plt.axis('off'), plt.title('Original image (%d colors)'%(len(np.unique(pepper)))), plt.imshow(pepper)

輸入的辣椒原始圖像如圖1所示。

圖1　辣椒圖像

現(xiàn)在，應(yīng)用k均值聚類算法對圖像進行分割，如下面的代碼所示：

n_colors = 64
# Convert to floats instead of the default 8 bits integer coding. Dividingby# 255 is important so that plt.imshow behaves works well on float data# (need tobe in the range [0-1])pepper = np.array(pepper, dtype=np.float64) / 255
# Load Image and transform to a 2D numpy array.w, h, d = original_shape = tuple(pepper.shape)assert d == 3image_array = np.reshape(pepper, (w * h, d))
def recreate_image(codebook, labels, w, h):    """Recreate the (compressed) image from the code book & labels"""    d = codebook.shape[1]    image = np.zeros((w, h, d))    label_idx = 0    for i in range(w):        for j in range(h):              image[i][j] = codebook[labels[label_idx]]              label_idx += 1return image
# Display all results, alongside original imageplt.figure(1)plt.clf()ax = plt.axes([0, 0, 1, 1])plt.axis('off')plt.title('Original image (96,615 colors)')plt.imshow(pepper)
plt.figure(2, figsize=(10,10))plt.clf()i = 1for k in [64, 32, 16, 4]:    t0 = time()    plt.subplot(2,2,i)    plt.axis('off')    image_array_sample = shuffle(image_array, random_state=0)[:1000]    kmeans = KMeans(n_clusters=k, random_state=0).fit(image_array_sample)    print("done in %0.3fs." % (time() - t0))    # Get labels for all points    print("Predicting color indices on the full image (k-means)")    t0 = time()    labels = kmeans.predict(image_array)    print("done in %0.3fs." % (time() - t0))    plt.title('Quantized image (' + str(k) + ' colors, K-Means)')    plt.imshow(recreate_image(kmeans.cluster_centers_, labels, w, h))    i += 1plt.show()plt.figure(3, figsize=(10,10))plt.clf()i = 1for k in [64, 32, 16, 4]:    t0 = time()    plt.subplot(2,2,i)    plt.axis('off')    codebook_random = shuffle(image_array, random_state=0)[:k + 1]    print("Predicting color indices on the full image (random)")    t0 = time()    labels_random = pairwise_distances_argmin(codebook_random,image_array,axis=0)
     print("done in %0.3fs." % (time() - t0))     plt.title('Quantized image (' + str(k) + ' colors, Random)')     plt.imshow(recreate_image(codebook_random, labels_random, w, h))     i += 1plt.show()

運行上述代碼，輸出結(jié)果如圖2所示?？梢钥吹?，在保留的圖像質(zhì)量方面，k均值聚類算法對于顏色量化的效果總是比使用隨機碼本要好。

圖2　使用k均值聚類算法進行辣椒圖像分割與顏色量化

由于圖像分割的譜聚類算法

本節(jié)將演示如何將譜聚類技術(shù)用于圖像分割。在這些設(shè)置中，譜聚類方法解決了稱為歸一化圖割的問題——圖像被看作一個連通像素的圖，譜聚類算法的實質(zhì)是選擇定義區(qū)域的圖切分，同時最小化沿著切分的梯度與區(qū)域體積的比值。來自scikit-learn聚類模塊的SpectralClustering()將用于將圖像分割為前景和背景。

將使用譜聚類算法得到的圖像分割結(jié)果與使用k均值聚類得到的二值分割結(jié)果進行對比，如下面的代碼所示：

from sklearn import clusterfrom skimage.io import imreadfrom skimage.color import rgb2grayfrom scipy.misc import imresizeimport matplotlib.pylab as pylabim = imresize(imread('../images/me14.jpg'), (100,100,3))img = rgb2gray(im)k = 2 # binary segmentation, with 2 output clusters / segmentsX = np.reshape(im, (-1, im.shape[-1]))two_means = cluster.MiniBatchKMeans(n_clusters=k, random_state=10)two_means.fit(X)y_pred = two_means.predict(X)labels = np.reshape(y_pred, im.shape[:2])pylab.figure(figsize=(20,20))pylab.subplot(221), pylab.imshow(np.reshape(y_pred, im.shape[:2])),pylab.title('k-means segmentation (k=2)', size=30)pylab.subplot(222), pylab.imshow(im), pylab.contour(labels == 0,contours=1, colors='red'), pylab.axis('off')pylab.title('k-means contour (k=2)', size=30)spectral = cluster.SpectralClustering(n_clusters=k, eigen_solver='arpack',affinity="nearest_neighbors", n_neighbors=100, random_state=10)spectral.fit(X)y_pred = spectral.labels_.astype(np.int)labels = np.reshape(y_pred, im.shape[:2])pylab.subplot(223), pylab.imshow(np.reshape(y_pred, im.shape[:2])),pylab.title('spectral segmentation (k=2)', size=30)pylab.subplot(224), pylab.imshow(im), pylab.contour(labels == 0,contours=1, colors='red'), pylab.axis('off'), pylab.title('spectral contour(k=2)', size=30), pylab.tight_layout()pylab.show()

運行上述代碼，輸出結(jié)果如圖3所示。可以看到，譜聚類算法相比k均值聚類算法對圖像的分割效果更好。

圖3　使用譜聚類與k均值聚類算法得到的圖像分割結(jié)果對比

PCA與特征臉

主成分分析（PCA）是一種統(tǒng)計/非監(jiān)督機器學(xué)習(xí)方法，它使用一個正交變換將一組觀測可能相關(guān)的變量轉(zhuǎn)化為一組線性不相關(guān)的變量的值，從而在數(shù)據(jù)集中發(fā)現(xiàn)最大方向的方差（沿著主要成分）。

這可以用于（線性）降維（只有幾個突出的主成分在大多數(shù)情況下捕獲數(shù)據(jù)集中的幾乎所有方差）和具有多個維度的數(shù)據(jù)集的可視化（在二維空間中）。PCA的一個應(yīng)用是特征面，找到一組可以（從理論上）表示任意面（作為這些特征面的線性組合）的特征面。

1．用PCA降維及可視化

在本節(jié)中，我們將使用scikit-learn的數(shù)字數(shù)據(jù)集，其中包含1797張手寫數(shù)字的圖像（每張圖像的大小為8×8像素）。每一行表示數(shù)據(jù)矩陣中的一幅圖像。用下面的代碼加載并顯示數(shù)據(jù)集中的前25位數(shù)字：

import numpy as npimport matplotlib.pylab as pltfrom sklearn.datasets import load_digitsfrom sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom sklearn.pipeline import Pipeline
digits = load_digits()#print(digits.keys())print(digits.data.shape)j = 1np.random.seed(1)fig = plt.figure(figsize=(3,3))fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05,wspace=0.05)for i in np.random.choice(digits.data.shape[0], 25):    plt.subplot(5,5,j), plt.imshow(np.reshape(digits.data[i,:], (8,8)),cmap='binary'),   plt.axis('off')    j += 1plt.show()

運行上述代碼，輸出數(shù)據(jù)集中的前25位手寫數(shù)字，如圖4所示。

圖4　數(shù)據(jù)集中的前25個數(shù)字

二維投影與可視化。從加載的數(shù)據(jù)集可以看出，它是一個64維的數(shù)據(jù)集?，F(xiàn)在，首先利用scikit-learn的PCA()函數(shù)來找到這個數(shù)據(jù)集的兩個主要成分并將數(shù)據(jù)集沿著兩個維度進行投影；其次利用Matplotlib和表示圖像（數(shù)字）的每個數(shù)據(jù)點，對投影數(shù)據(jù)進行散點繪圖，數(shù)字標簽用一種獨特的顏色表示，如下面的代碼所示：

pca_digits=PCA(2)digits.data_proj = pca_digits.fit_transform(digits.data)print(np.sum(pca_digits.explained_variance_ratio_))# 0.28509364823696987plt.figure(figsize=(15,10))plt.scatter(digits.data_proj[:, 0], digits.data_proj[:, 1], lw=0.25,c=digits.target, edgecolor='k', s=100, cmap=plt.cm.get_cmap('cubehelix',10))plt.xlabel('PC1', size=20), plt.ylabel('PC2', size=20), plt.title('2DProjection of handwritten digits with PCA', size=25)plt.colorbar(ticks=range(10), label='digit value')plt.clim(-0.5, 9.5)

運行上述代碼，輸出結(jié)果如圖5所示。可以看到，在沿PC1和PC2兩個方向的二維投影中，數(shù)字有某種程度的分離（雖然有些重疊），而相同的數(shù)字值則出現(xiàn)在集群附近。

圖5　利用PCA進行手寫數(shù)字的二維投影的顏色散布圖

2．基于PCA的特征臉

加載scikit-learn包的olivetti人臉數(shù)據(jù)集，其中包含400張人臉圖像，每張圖像的大小為64×64像素。如下代碼顯示了數(shù)據(jù)集中的一些隨機面孔：

from sklearn.datasets import fetch_olivetti_facesfaces = fetch_olivetti_faces().dataprint(faces.shape) # there are 400 faces each of them is of 64x64=4096 pixelsfig = plt.figure(figsize=(5,5))fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)# plot 25 random facesj = 1np.random.seed(0)for i in np.random.choice(range(faces.shape[0]), 25):    ax = fig.add_subplot(5, 5, j, xticks=[], yticks=[])    ax.imshow(np.reshape(faces[i,:],(64,64)), cmap=plt.cm.bone,interpolation='nearest')    j += 1plt.show()

運行上述代碼，輸出從數(shù)據(jù)集中隨機選取的25張人臉圖像，如圖6所示。

圖6　從數(shù)據(jù)集中隨機選取的人臉圖像

接下來，對數(shù)據(jù)集進行預(yù)處理，在對圖像應(yīng)用PCA之前先執(zhí)行z分數(shù)歸一化（從所有人臉中減去平均人臉，然后除以標準差），這是必要的步驟；然后，使用PCA()計算主成分，只選取64個（而不是4096個）主成分，并將數(shù)據(jù)集投射到PC方向上，如下面的代碼所示，并通過選擇越來越多的主成分來可視化圖像數(shù)據(jù)集的方差。

from sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom sklearn.pipeline import Pipelinen_comp =64pipeline = Pipeline([('scaling', StandardScaler()), ('pca',PCA(n_components=n_comp))])faces_proj = pipeline.fit_transform(faces)print(faces_proj.shape)# (400, 64)mean_face = np.reshape(pipeline.named_steps['scaling'].mean_, (64,64))sd_face = np.reshape(np.sqrt(pipeline.named_steps['scaling'].var_),(64,64))pylab.figure(figsize=(8, 6))pylab.plot(np.cumsum(pipeline.named_steps['pca'].explained_variance_ratio_), linewidth=2)pylab.grid(), pylab.axis('tight'), pylab.xlabel('n_components'),pylab.ylabel('cumulative explained_variance_ratio_')pylab.show()pylab.figure(figsize=(10,5))pylab.subplot(121), pylab.imshow(mean_face, cmap=pylab.cm.bone),pylab.axis('off'), pylab.title('Mean face')pylab.subplot(122), pylab.imshow(sd_face, cmap=pylab.cm.bone),pylab.axis('off'), pylab.title('SD face')pylab.show()

運行上述代碼，輸出結(jié)果如圖7所示?？梢钥吹剑蠹s90%的方差僅由前64個主成分所主導(dǎo)。

圖7　64個主成分的累積方差占比

從數(shù)據(jù)集中計算得到的人臉圖像的均值和標準差如圖8所示。

圖8　人臉圖像數(shù)據(jù)集的均值與標準差圖像

（1）特征臉。在主成分分析的基礎(chǔ)上，計算得到的兩PC方向相互正交，每個PC包含4096個像素，并且可以重構(gòu)成大小的64×64像素的圖像。稱這些主成分為特征臉（因為它們也是特征向量）。

可以看出，特征臉代表了人臉的某些屬性。如下代碼用于顯示一些計算出來的特征臉：

fig = plt.figure(figsize=(5,2))fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05,wspace=0.05)# plot the first 10 eigenfacesfor i in range(10):    ax = fig.add_subplot(2, 5, i+1, xticks=[], yticks=[])    ax.imshow(np.reshape(pipeline.named_steps['pca'].components_[i,:],    (64,64)), cmap=plt.cm.bone, interpolation='nearest')

運行上述代碼，輸出前10張?zhí)卣髂槪鐖D9所示。

圖9　主成分重構(gòu)的前10張?zhí)卣髂?/span>

（2）重建。如下代碼演示了如何將每張人臉近似地表示成這64張主要特征臉的線性組合。使用scikit-learn中的inverse_transform()函數(shù)變換回到原空間，但是只基于這64張主特征臉，而拋棄所有其他特征臉。

# face reconstructionfaces_inv_proj = pipeline.named_steps['pca'].inverse_transform(faces_proj)#reshaping as 400 images of 64x64 dimensionfig = plt.figure(figsize=(5,5))fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05,wspace=0.05)# plot the faces, each image is 64 by 64 dimension but 8x8 pixelsj = 1np.random.seed(0)for i in np.random.choice(range(faces.shape[0]), 25):    ax = fig.add_subplot(5, 5, j, xticks=[], yticks=[])    ax.imshow(mean_face + sd_face*np.reshape(faces_inv_proj,(400,64,64))     [i,:], cmap=plt.cm.bone, interpolation='nearest')    j += 1

運行上述代碼，從64張?zhí)卣魅四樦须S機選擇25張重建的人臉圖像，如圖10所示?？梢钥吹?，它們看起來很像原始的人臉（沒有很多明顯的錯誤）。

圖10　由特征人臉重建的人臉圖像

如下代碼有助于更近距離地觀察原始人臉，并將其與重建后的人臉進行對比：如下代碼的輸出結(jié)果如圖11所示?？梢钥吹?，重構(gòu)后的人臉與原始人臉近似，但存在某種程度的失真。

orig_face = np.reshape(faces[0,:], (64,64))reconst_face =np.reshape(faces_proj[0,:]@pipeline.named_steps['pca'].components_,(64,64))reconst_face = mean_face + sd_face*reconst_faceplt.figure(figsize=(10,5))plt.subplot(121), plt.imshow(orig_face, cmap=plt.cm.bone,interpolation='nearest'), plt.axis('off'), plt.title('original', size=20)plt.subplot(122), plt.imshow(reconst_face, cmap=plt.cm.bone,interpolation='nearest'), plt.axis('off'), plt.title('reconstructed',size=20)plt.show()

圖11　重建后的人臉圖像與原始人臉圖像對比

（3）特征分解。每張人臉都可以表示為64張?zhí)卣髂樀木€性組合。每張?zhí)卣髂槍τ诓煌娜四槇D像有不同的權(quán)重（負載）。圖12顯示了如何用特征臉表示人臉，并顯示了前幾個相應(yīng)的權(quán)重。其實現(xiàn)代碼留給讀者作為練習(xí)。

圖12　由特征臉進行線性組合，重建人臉圖像

Part3

監(jiān)督機器學(xué)習(xí)

在本節(jié)中，我們將討論圖像分類問題。使用的輸入數(shù)據(jù)集是MNIST，這是機器學(xué)習(xí)中的一個經(jīng)典數(shù)據(jù)集，由28像素×28像素的手寫數(shù)字的灰度圖像組成。

原始訓(xùn)練數(shù)據(jù)集包含60000個樣本（手寫數(shù)字圖像和標簽，用于訓(xùn)練機器學(xué)習(xí)模型），測試數(shù)據(jù)集包含10000個樣本（手寫數(shù)字圖像和標簽作為基本事實，用于測試所學(xué)習(xí)模型的準確性）。給定一組手寫數(shù)字和圖像及其標簽（0～9），目標是學(xué)習(xí)一種機器學(xué)習(xí)模型，該模型可以自動識別不可見圖像中的數(shù)字，并為圖像分配一個標簽（0～9）。具體步驟如下。

（1）首先，使用訓(xùn)練數(shù)據(jù)集訓(xùn)練一些監(jiān)督機器學(xué)習(xí)（多類分類）模型（分類器）。

（2）其次，它們將用于預(yù)測來自測試數(shù)據(jù)集的圖像的標簽。

（3）然后將預(yù)測的標簽與基本真值標簽進行比較，以評估分類器的性能。

訓(xùn)練、預(yù)測和評估基本分類模型的步驟如圖13所示。當在訓(xùn)練數(shù)據(jù)集上訓(xùn)練更多不同的模型（可能是使用不同的算法，或者使用相同的算法但算法具有不同的超參數(shù)值）時，為了選擇最好的模型，需要第三個數(shù)據(jù)集，也就是驗證數(shù)據(jù)集（訓(xùn)練數(shù)據(jù)集分為兩部分，一個用于訓(xùn)練，另一個待驗證），用于模型選擇和超參調(diào)優(yōu)。

圖13　監(jiān)督機器學(xué)習(xí)圖像分類的流程

同樣，先導(dǎo)入所需的庫，如下面的代碼所示：

%matplotlib inlineimport gzip, os, sysimport numpy as npfrom scipy.stats import multivariate_normalfrom urllib.request import urlretrieveimport matplotlib.pyplot as pylab

下載MNIST（手寫數(shù)字）數(shù)據(jù)集

從下載MNIST數(shù)據(jù)集開始。如下代碼展示了如何下載訓(xùn)練數(shù)據(jù)集和測試數(shù)據(jù)集：

# Function that downloads a specified MNIST data file from Yann Le Cun's websitedef download(filename, source='http://yann.lecun.com/exdb/mnist/'):    print("Downloading %s" % filename)    urlretrieve(source + filename, filename)
# Invokes download() if necessary, then reads in imagesdef load_mnist_images(filename):    if not os.path.exists(filename):        download(filename)    with gzip.open(filename, 'rb') as f:        data = np.frombuffer(f.read(), np.uint8, offset=16)    data = data.reshape(-1,784)    return datadef load_mnist_labels(filename):    if not os.path.exists(filename):        download(filename)    with gzip.open(filename, 'rb') as f:        data = np.frombuffer(f.read(), np.uint8, offset=8)    return data
## Load the training settrain_data = load_mnist_images('train-images-idx3-ubyte.gz')train_labels = load_mnist_labels('train-labels-idx1-ubyte.gz')## Load the testing settest_data = load_mnist_images('t10k-images-idx3-ubyte.gz')test_labels = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
print(train_data.shape)# (60000, 784) ## 60k 28x28 handwritten digitsprint(test_data.shape)# (10000, 784) ## 10k 2bx28 handwritten digits

可視化數(shù)據(jù)集

每個數(shù)據(jù)點存儲為784維向量。為了可視化一個數(shù)據(jù)點，需要將其重塑為一個28像素×28像素的圖像。如下代碼展示了如何顯示測試數(shù)據(jù)集中的手寫數(shù)字：

## Define a function that displays a digit given its vector representationdef show_digit(x, label): pylab.axis('off') pylab.imshow(x.reshape((28,28)), cmap=pylab.cm.gray) pylab.title('Label ' + str(label))
pylab.figure(figsize=(10,10))for i in range(25): pylab.subplot(5, 5, i+1) show_digit(test_data[i,], test_labels[i])pylab.tight_layout()pylab.show()

圖14所示的是來自測試數(shù)據(jù)集的前25個手寫數(shù)字及其真相（true）標簽。在訓(xùn)練數(shù)據(jù)集上訓(xùn)練的KNN分類器對這個未知的測試數(shù)據(jù)集的標簽進行預(yù)測，并將預(yù)測的標簽與真相標簽進行比較，以評價分類器的準確性。

圖14　測試數(shù)據(jù)集的前25個手寫數(shù)字及其真相標簽

通過訓(xùn)練KNN、高斯貝葉斯和SVM模型對MNIST數(shù)據(jù)集分類

用scikit-learn庫函數(shù)實現(xiàn)以下分類器：K最近鄰分類算法、高斯貝葉斯分類器（生成模型）、支持向量機分類器。

從K最近鄰分類器開始介紹。

1．K最近鄰分類器

本節(jié)將構(gòu)建一個分類器，該分類器用于接收手寫數(shù)字的圖像，并使用一種稱為最近鄰分類器的特別簡單的策略輸出標簽（0～9）。預(yù)測看不見的測試數(shù)字圖像的方法是非常簡單的。首先，只需要從訓(xùn)練數(shù)據(jù)集中找到離測試圖像最近的k個實例；其次，只需要簡單地使用多數(shù)投票來計算測試圖像的標簽，也就是說，來自k個最近的訓(xùn)練數(shù)據(jù)點的大部分數(shù)據(jù)點的標簽將被分配給測試圖像（任意斷開連接）。

（1）歐氏距離平方。欲計算數(shù)據(jù)集中的最近鄰，必須計算數(shù)據(jù)點之間的距離。自然距離函數(shù)是歐氏距離，對于兩個向量x, y∈Rd，其歐氏距離定義為：

通常省略平方根，只計算歐氏距離的平方。對于最近鄰計算，這兩個是等價的：對于3個向量x, y, z∈Rd，當且僅當||x?y||2≤||x?z||2時，才有||x?y||≤||x?z||成立。因此，現(xiàn)在只需要計算歐氏距離的平方。

（2）計算最近鄰。k最近鄰的一個簡單實現(xiàn)就是掃描每個測試圖像的每個訓(xùn)練圖像。以這種方式實施的最近鄰分類需要遍歷訓(xùn)練集才能對單個點進行分類。如果在Rd中有N個訓(xùn)練點，時間花費將為O (Nd)，這是非常緩慢的。幸運的是，如果愿意花一些時間對訓(xùn)練集進行預(yù)處理，就有更快的方法來執(zhí)行最近鄰查找。scikit-learn庫有兩個有用的最近鄰數(shù)據(jù)結(jié)構(gòu)的快速實現(xiàn)：球樹和k-d樹。如下代碼展示了如何在訓(xùn)練時創(chuàng)建一個球樹數(shù)據(jù)結(jié)構(gòu)，然后在測試1?NN（k=1）時將其用于快速最近鄰計算：

import timefrom sklearn.neighbors import BallTree
## Build nearest neighbor structure on training datat_before = time.time()ball_tree = BallTree(train_data)t_after = time.time()
## Compute training timet_training = t_after - t_beforeprint("Time to build data structure (seconds): ", t_training)
## Get nearest neighbor predictions on testing datat_before = time.time()test_neighbors = np.squeeze(ball_tree.query(test_data, k=1,return_distance=False))test_predictions = train_labels[test_neighbors]t_after = time.time()
## Compute testing timet_testing = t_after - t_beforeprint("Time to classify test set (seconds): ", t_testing)# Time to build data structure (seconds): 20.65474772453308# Time to classify test set (seconds): 532.3929145336151

（3）評估分類器的性能。接下來將評估分類器在測試數(shù)據(jù)集上的性能。如下代碼展示了如何實現(xiàn)這一點：

# evaluate the classifiert_accuracy = sum(test_predictions == test_labels) / float(len(test_labels))t_accuracy# 0.96909999999999996
import pandas as pdimport seaborn as snfrom sklearn import metrics
cm = metrics.confusion_matrix(test_labels,test_predictions)df_cm = pd.DataFrame(cm, range(10), range(10))sn.set(font_scale=1.2)#for label sizesn.heatmap(df_cm, annot=True,annot_kws={"size": 16}, fmt="g")

運行上述代碼，輸出混淆矩陣，如圖15所示?？梢钥吹?，雖然訓(xùn)練數(shù)據(jù)集的整體準確率達到96.9%，但仍存在一些錯誤分類的測試圖像。

圖15　混淆矩陣

圖16中，當1-NN預(yù)測標簽和，True標簽均為0時，預(yù)測成功；當1-NN預(yù)測標簽為2，True標簽為3時，預(yù)測失敗。

圖16　預(yù)測數(shù)字成功與失敗的情形

其中預(yù)測數(shù)字成功和失敗情形的代碼留給讀者作為練習(xí)。

2．貝葉斯分類器（高斯生成模型）

正如我們在上一小節(jié)所看到的，1-NN分類器對手寫數(shù)字MNIST數(shù)據(jù)集的測試錯誤率為3.09%?，F(xiàn)在，我們將構(gòu)建一個高斯生成模型，使其幾乎可以達到同樣的效果，但明顯更快、更緊湊。同樣，必須像上次一樣首先加載MNIST訓(xùn)練數(shù)據(jù)集和測試數(shù)據(jù)集，然后將高斯生成模型擬合到訓(xùn)練數(shù)據(jù)集中。

（1）訓(xùn)練生成模型——計算高斯參數(shù)的最大似然估計。下面定義了一個函數(shù)fit_generative_model()，它接收一個訓(xùn)練集（x數(shù)據(jù)和y標簽）作為輸入，并將高斯生成模型與之匹配。對于每個標簽j = 0，1，…，9，返回以下幾種生成模型的參數(shù)。

πj：標簽的頻率（即優(yōu)先的）；

μj：784維平均向量；

∑j：784×784協(xié)方差矩陣。這意味著π是10×1、μ是10×784、∑是10×784×784的矩陣。最大似然估計（Maximum Likelihood Estimates，MLE）為經(jīng)驗估計，如圖17所示。

圖17　最大似然估計

經(jīng)驗協(xié)方差很可能是奇異的（或接近奇異），這意味著不能用它們來計算，因此對這些矩陣進行正則化是很重要的。這樣做的標準方法是加上c*I，其中c是一個常數(shù)，I是784維單位矩陣（換言之，先計算經(jīng)驗協(xié)方差，然后將它們的對角元素增加某個常數(shù)c）。

對于任何c > 0，無論c多么小，這樣修改可以確保產(chǎn)生非奇異的協(xié)方差矩陣。現(xiàn)在c成為一個（正則化）參數(shù)，通過適當?shù)卦O(shè)置它，可以提高模型的性能。為此，應(yīng)該選擇一個好的c值。然而至關(guān)重要的是需要單獨使用訓(xùn)練集來完成，通過將部分訓(xùn)練集作為驗證集，或者使用某種交叉驗證。這將作為練習(xí)留給讀者完成。特別地，display_char()函數(shù)將用于可視化前3位數(shù)字的高斯均值，如下面的代碼所示：

def display_char(image):    plt.imshow(np.reshape(image, (28,28)), cmap=plt.cm.gray)    plt.axis('off')，plt.show()
def fit_generative_model(x,y):    k = 10 # labels 0,1,...,k-1    d = (x.shape)[1] # number of features    mu = np.zeros((k,d))    sigma = np.zeros((k,d,d))    pi = np.zeros(k)    c = 3500 #10000 #1000 #100 #10 #0.1 #1e9    for label in range(k):          indices = (y == label)          pi[label] = sum(indices) / float(len(y))          mu[label] = np.mean(x[indices,:], axis=0)          sigma[label] = np.cov(x[indices,:], rowvar=0, bias=1) + c*np.eye(d)return mu, sigma, pi
mu, sigma, pi = fit_generative_model(train_data, train_labels)display_char(mu[0])display_char(mu[1])display_char(mu[2])

運行上述代碼，輸出前3位數(shù)字的平均值的最大似然估計，如圖18所示。

圖18　前3位數(shù)字的平均值的最大似然估計

（2）計算后驗概率，以對試驗數(shù)據(jù)進行預(yù)測和模型評價。為了預(yù)測新圖像的標簽x，需要找到標簽j，其后驗概率Pr(y = j|x)最大?？梢杂秘惾~斯規(guī)則計算，如圖19所示。

圖19　貝葉斯計算規(guī)則

如下代碼展示了如何使用生成模型預(yù)測測試數(shù)據(jù)集的標簽，以及如何計算模型在測試數(shù)據(jù)集上產(chǎn)生錯誤的數(shù)量。可以看出，測試數(shù)據(jù)集的準確率為95.6%，略低于1-NN分類器。

# Compute log Pr(label|image) for each [test image,label] pair.k = 10score = np.zeros((len(test_labels),k))for label in range(0,k): rv = multivariate_normal(mean=mu[label], cov=sigma[label]) for i in range(0,len(test_labels)):      score[i,label] = np.log(pi[label]) + rv.logpdf(test_data[i,:])test_predictions = np.argmax(score, axis=1)# Finally, tally up scoreerrors = np.sum(test_predictions != test_labels)print("The generative model makes " + str(errors) + " errors out of 10000")# The generative model makes 438 errors out of 10000t_accuracy = sum(test_predictions == test_labels) / float(len(test_labels)t_accuracy# 0.95620000000000005

3．SVM分類器

本節(jié)將使用MNIST訓(xùn)練數(shù)據(jù)集訓(xùn)練（多類）支持向量機（SVM）分類器，然后用它預(yù)測來自MNIST測試數(shù)據(jù)集的圖像的標簽。

支持向量機是一種非常復(fù)雜的二值分類器，它使用二次規(guī)劃來最大化分離超平面之間的邊界。利用1︰全部或1︰1技術(shù)，將二值SVM分類器擴展到處理多類分類問題。使用scikit-learn的實現(xiàn)SVC()，它具有多項式核（二次），利用訓(xùn)練數(shù)據(jù)集來擬合（訓(xùn)練）軟邊緣（核化）SVM分類器，然后用score()函數(shù)預(yù)測測試圖像的標簽。

如下代碼展示了如何使用MNIST數(shù)據(jù)集訓(xùn)練、預(yù)測和評估SVM分類器?？梢钥吹?，使用該分類器在測試數(shù)據(jù)集上所得到的準確率提高到了98%。

from sklearn.svm import SVCclf = SVC(C=1, kernel='poly', degree=2)clf.fit(train_data,train_labels)print(clf.score(test_data,test_labels))# 0.9806test_predictions = clf.predict(test_data)cm = metrics.confusion_matrix(test_labels,test_predictions)df_cm = pd.DataFrame(cm, range(10), range(10))sn.set(font_scale=1.2)sn.heatmap(df_cm, annot=True,annot_kws={"size": 16}, fmt="g")

運行上述代碼，輸出混淆矩陣，如圖20所示。

圖20　混淆矩陣

接下來，找到SVM分類器預(yù)測錯誤標簽的測試圖像（與真實標簽不同）。

如下代碼展示了如何找到這樣一幅圖像，并將其與預(yù)測的和真實的標簽一起顯示：

wrong_indices = test_predictions != test_labelswrong_digits, wrong_preds, correct_labs = test_data[wrong_indices],test_predictions[wrong_indices], test_labels[wrong_indices]print(len(wrong_pred))# 194pylab.title('predicted: ' + str(wrong_preds[1]) +', actual: ' +str(correct_labs[1]))display_char(wrong_digits[1])

運行上述代碼，輸出結(jié)果如圖21所示。可以看到，測試圖像具有真實的標簽2，但圖像看起來卻更像7，因此SVM預(yù)測為7。

圖21　預(yù)測為7而實際為2的情形

本文摘自《Python圖像處理實戰(zhàn)》

轉(zhuǎn)自程序員薦書

本文僅做學(xué)術(shù)分享，如有侵權(quán)，請聯(lián)系刪文。

下載1：OpenCV-Contrib擴展模塊中文版教程

在「小白學(xué)視覺」公眾號后臺回復(fù)：擴展模塊中文教程，即可下載全網(wǎng)第一份OpenCV擴展模塊教程中文版，涵蓋擴展模塊安裝、SFM算法、立體視覺、目標跟蹤、生物視覺、超分辨率處理等二十多章內(nèi)容。

下載2：Python視覺實戰(zhàn)項目52講

在「小白學(xué)視覺」公眾號后臺回復(fù)：Python視覺實戰(zhàn)項目，即可下載包括圖像分割、口罩檢測、車道線檢測、車輛計數(shù)、添加眼線、車牌識別、字符識別、情緒檢測、文本內(nèi)容提取、面部識別等31個視覺實戰(zhàn)項目，助力快速學(xué)校計算機視覺。

下載3：OpenCV實戰(zhàn)項目20講

在「小白學(xué)視覺」公眾號后臺回復(fù)：OpenCV實戰(zhàn)項目20講，即可下載含有20個基于OpenCV實現(xiàn)20個實戰(zhàn)項目，實現(xiàn)OpenCV學(xué)習(xí)進階。

交流群

歡迎加入公眾號讀者群一起和同行交流，目前有SLAM、三維視覺、傳感器、自動駕駛、計算攝影、檢測、分割、識別、醫(yī)學(xué)影像、GAN、算法競賽等微信群（以后會逐漸細分），請掃描下面微信號加群，備注：”昵稱+學(xué)校/公司+研究方向“，例如：”張三 + 上海交大 + 視覺SLAM“。請按照格式備注，否則不予通過。添加成功后會根據(jù)研究方向邀請進入相關(guān)微信群。請勿在群內(nèi)發(fā)送廣告，否則會請出群，謝謝理解~

圖像處理中的經(jīng)典機器學(xué)習(xí)方法