欧美国产一级操逼,午夜激情福利视频,色老板精品视频在线观看,操逼视频在线观看,久久久久久麻豆,天天操三级,少妇网址,亚洲熟逼

本課程是中國大學(xué)慕課《機(jī)器學(xué)習(xí)》的“聚類”章節(jié)的課后代碼。
課程地址：
https://www.icourse163.org/course/WZU-1464096179
課程完整代碼：
https://github.com/fengdu78/WZU-machine-learning-course
代碼修改并注釋：黃海廣，[email protected]

在本練習(xí)中，我們將實(shí)現(xiàn)K-means聚類，并使用它來壓縮圖像。我們將從一個(gè)簡單的2D數(shù)據(jù)集開始，以了解K-means是如何工作的，然后我們將其應(yīng)用于圖像壓縮。我們還將對(duì)主成分分析進(jìn)行實(shí)驗(yàn)，并了解如何使用它來找到面部圖像的低維表示。

K-means 聚類

我們將實(shí)施和應(yīng)用K-means到一個(gè)簡單的二維數(shù)據(jù)集，以獲得一些直觀的工作原理。K-means是一個(gè)迭代的，無監(jiān)督的聚類算法，將類似的實(shí)例組合成簇。該算法通過猜測每個(gè)簇的初始聚類中心開始，然后重復(fù)將實(shí)例分配給最近的簇，并重新計(jì)算該簇的聚類中心。我們要實(shí)現(xiàn)的第一部分是找到數(shù)據(jù)中每個(gè)實(shí)例最接近的聚類中心的函數(shù)。

import?numpy?as?np
import?pandas?as?pd
import?matplotlib.pyplot?as?plt
import?seaborn?as?sb
from?scipy.io?import?loadmat

def?find_closest_centroids(X,?centroids):
????m?=?X.shape[0]
????k?=?centroids.shape[0]
????idx?=?np.zeros(m)

????for?i?in?range(m):
????????min_dist?=?1000000
????????for?j?in?range(k):
????????????dist?=?np.sum((X[i,?:]?-?centroids[j,?:])**2)
????????????if?dist?????????????????min_dist?=?dist
????????????????idx[i]?=?j

????return?idx

讓我們來測試這個(gè)函數(shù)，以確保它的工作正常。我們將使用練習(xí)中提供的測試用例。

data2?=?pd.read_csv('data/ex7data2.csv')
data2.head()

	X1	X2
0	1.842080	4.607572
1	5.658583	4.799964
2	6.352579	3.290854
3	2.904017	4.612204
4	3.231979	4.939894

X=data2.values

initial_centroids?=?initial_centroids?=?np.array([[3,?3],?[6,?2],?[8,?5]])

idx?=?find_closest_centroids(X,?initial_centroids)
idx[0:3]

array([0., 2., 1.])

輸出與文本中的預(yù)期值匹配（記住我們的數(shù)組是從零開始索引的，而不是從一開始索引的，所以值比練習(xí)中的值低一個(gè)）。接下來，我們需要一個(gè)函數(shù)來計(jì)算簇的聚類中心。聚類中心只是當(dāng)前分配給簇的所有樣本的平均值。

sb.set(context="notebook",?style="white")
sb.lmplot(x='X1',?y='X2',?data=data2,?fit_reg=False)
plt.show()

def?compute_centroids(X,?idx,?k):
????m,?n?=?X.shape
????centroids?=?np.zeros((k,?n))

????for?i?in?range(k):
????????indices?=?np.where(idx?==?i)
????????centroids[i,?:]?=?(np.sum(X[indices,?:],?axis=1)?/
???????????????????????????len(indices[0])).ravel()

????return?centroids

compute_centroids(data2.values,?idx,?3)

array([[2.42830111, 3.15792418],
       [5.81350331, 2.63365645],
       [7.11938687, 3.6166844 ]])

此輸出也符合練習(xí)中的預(yù)期值。下一部分涉及實(shí)際運(yùn)行該算法的一些迭代次數(shù)和可視化結(jié)果。這個(gè)步驟是由于并不復(fù)雜，我將從頭開始構(gòu)建它。為了運(yùn)行算法，我們只需要在將樣本分配給最近的簇并重新計(jì)算簇的聚類中心。

def?run_k_means(X,?initial_centroids,?max_iters):
????m,?n?=?X.shape
????k?=?initial_centroids.shape[0]
????idx?=?np.zeros(m)
????centroids?=?initial_centroids
????
????for?i?in?range(max_iters):
????????idx?=?find_closest_centroids(X,?centroids)
????????centroids?=?compute_centroids(X,?idx,?k)
????
????return?idx,?centroids

idx,?centroids?=?run_k_means(X,?initial_centroids,?10)

cluster1?=?X[np.where(idx?==?0)[0],:]
cluster2?=?X[np.where(idx?==?1)[0],:]
cluster3?=?X[np.where(idx?==?2)[0],:]

fig,?ax?=?plt.subplots(figsize=(15,10))
ax.scatter(cluster1[:,0],?cluster1[:,1],?s=30,?color='r',?label='Cluster?1')
ax.scatter(cluster2[:,0],?cluster2[:,1],?s=30,?color='g',?label='Cluster?2')
ax.scatter(cluster3[:,0],?cluster3[:,1],?s=30,?color='b',?label='Cluster?3')
ax.legend()
plt.show()

我們跳過的一個(gè)步驟是初始化聚類中心的過程。這可以影響算法的收斂。我們的任務(wù)是創(chuàng)建一個(gè)選擇隨機(jī)樣本并將其用作初始聚類中心的函數(shù)。

def?init_centroids(X,?k):
????m,?n?=?X.shape
????centroids?=?np.zeros((k,?n))
????idx?=?np.random.randint(0,?m,?k)

????for?i?in?range(k):
????????centroids[i,?:]?=?X[idx[i],?:]

????return?centroids

init_centroids(X,?3)

array([[1.52334113, 4.87916159],
       [3.06192918, 1.5719211 ],
       [1.75164337, 0.68853536]])

k值的選擇

使用“肘部法則”選取k值

from?sklearn.cluster?import?KMeans

#?'利用SSE選擇k'
SSE?=?[]??#?存放每次結(jié)果的誤差平方和
for?k?in?range(1,?9):
????estimator?=?KMeans(n_clusters=k)??#?構(gòu)造聚類器
????estimator.fit(data2)
????SSE.append(estimator.inertia_)
X?=?range(1,?9)

plt.figure(figsize=(15,?10))
plt.xlabel('k')
plt.ylabel('SSE')
plt.plot(X,?SSE,?'o-')
plt.show()

圖中可以看出，k=3的時(shí)候是肘點(diǎn)，所以，選擇k=3.

K-means圖像壓縮

我們的下一個(gè)任務(wù)是將K-means應(yīng)用于圖像壓縮。從下面的演示可以看到，我們可以使用聚類來找到最具代表性的少數(shù)顏色，并使用聚類分配將原始的24位顏色映射到較低維的顏色空間。

下面是我們要壓縮的圖像。

from?IPython.display?import?Image
Image(filename='data/bird_small.png')

The raw pixel data has been pre-loaded for us so let's pull it in.

image_data?=?loadmat('data/bird_small.mat')
#?image_data

A?=?image_data['A']
A.shape

(128, 128, 3)

現(xiàn)在我們需要對(duì)數(shù)據(jù)應(yīng)用一些預(yù)處理，并將其提供給K-means算法。

#?normalize?value?ranges
A?=?A?/?255.

#?reshape?the?array
X?=?np.reshape(A,?(A.shape[0]?*?A.shape[1],?A.shape[2]))
X.shape

(16384, 3)

#?randomly?initialize?the?centroids
initial_centroids?=?init_centroids(X,?16)

#?run?the?algorithm
idx,?centroids?=?run_k_means(X,?initial_centroids,?10)

#?get?the?closest?centroids?one?last?time
idx?=?find_closest_centroids(X,?centroids)

#?map?each?pixel?to?the?centroid?value
X_recovered?=?centroids[idx.astype(int),:]
X_recovered.shape

(16384, 3)

#?reshape?to?the?original?dimensions
X_recovered?=?np.reshape(X_recovered,?(A.shape[0],?A.shape[1],?A.shape[2]))
X_recovered.shape

(128, 128, 3)

plt.imshow(X_recovered)
plt.show()

您可以看到我們對(duì)圖像進(jìn)行了壓縮，但圖像的主要特征仍然存在。這就是K-means。下面我們來用scikit-learn來實(shí)現(xiàn)K-means。

from?skimage?import?io

#?cast?to?float,?you?need?to?do?this?otherwise?the?color?would?be?weird?after?clustring
pic?=?io.imread('data/bird_small.png')?/?255.
io.imshow(pic)
plt.show()

pic.shape

(128, 128, 3)

#?serialize?data
data?=?pic.reshape(128*128,?3)

data.shape

(16384, 3)

from?sklearn.cluster?import?KMeans#導(dǎo)入kmeans庫

model?=?KMeans(n_clusters=16,?n_init=100)

model.fit(data)

KMeans(n_clusters=16, n_init=100)

centroids?=?model.cluster_centers_
print(centroids.shape)

C?=?model.predict(data)
print(C.shape)

(16, 3)
(16384,)

centroids[C].shape

(16384, 3)

compressed_pic?=?centroids[C].reshape((128,128,3))

fig,?ax?=?plt.subplots(1,?2)
ax[0].imshow(pic)
ax[1].imshow(compressed_pic)
plt.show()

密度聚類

DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一個(gè)比較有代表性的基于密度的聚類算法。與劃分和層次聚類方法不同，它將簇定義為密度相連的點(diǎn)的最大集合，能夠把具有足夠高密度的區(qū)域劃分為簇，并可在噪聲的空間數(shù)據(jù)庫中發(fā)現(xiàn)任意形狀的聚類。

import?numpy?as?np
from?sklearn.cluster?import?DBSCAN
from?sklearn?import?metrics
from?sklearn.datasets?import?make_blobs
from?sklearn.preprocessing?import?StandardScaler

import?matplotlib.pyplot?as?plt
plt.rcParams['font.sans-serif']=['SimHei']?#用來正常顯示中文標(biāo)簽
plt.rcParams['axes.unicode_minus']=False?#用來正常顯示負(fù)號(hào)

創(chuàng)建樣本數(shù)據(jù)

centers?=?[[1,?1],?[-1,?-1],?[1,?-1]]
X,?labels_true?=?make_blobs(
????n_samples=750,?centers=centers,?cluster_std=0.4,?random_state=0
)

標(biāo)準(zhǔn)化數(shù)據(jù)

X?=?StandardScaler().fit_transform(X)

在DBSCAN使用兩個(gè)超參數(shù)：

掃描半徑 (eps)和最小包含點(diǎn)數(shù)(minPts)來獲得簇的數(shù)量，而不是猜測簇的數(shù)目。

(1)掃描半徑 (eps) : 用于定位點(diǎn)/檢查任何點(diǎn)附近密度的距離度量，即掃描半徑。
(2)最小包含點(diǎn)數(shù)(minPts) ：聚集在一起的最小點(diǎn)數(shù)(閾值)，該區(qū)域被認(rèn)為是稠密的。

我們定義一個(gè)plot_dbscan(MyEps, MiniSample)函數(shù)，MyEps代表eps，MiniSample代表minPts。

def?plot_dbscan(MyEps,?MiniSample):
????db?=?DBSCAN(eps=MyEps,?min_samples=MiniSample).fit(X)
????core_samples_mask?=?np.zeros_like(db.labels_,?dtype=bool)
????core_samples_mask[db.core_sample_indices_]?=?True
????labels?=?db.labels_

????#?標(biāo)簽中的簇?cái)?shù)，忽略噪聲點(diǎn)（如果存在）。
????n_clusters_?=?len(set(labels))?-?(1?if?-1?in?labels?else?0)
????n_noise_?=?list(labels).count(-1)

????print("估計(jì)的簇的數(shù)量:?%d"?%?n_clusters_)
????print("估計(jì)的噪聲點(diǎn)數(shù)量:?%d"?%?n_noise_)
????print("同一性(Homogeneity):?%0.4f"?%
??????????metrics.homogeneity_score(labels_true,?labels))
????print("完整性(Completeness):?%0.4f"?%
??????????metrics.completeness_score(labels_true,?labels))
????print("V-measure:?%0.3f"?%?metrics.v_measure_score(labels_true,?labels))
????print("ARI(Adjusted?Rand?Index):?%0.4f"?%
??????????metrics.adjusted_rand_score(labels_true,?labels))
????print("AMI(Adjusted?Mutual?Information):?%0.4f"?%
??????????metrics.adjusted_mutual_info_score(labels_true,?labels))
????print("輪廓系數(shù)(Silhouette?Coefficient):?%0.4f"?%
??????????metrics.silhouette_score(X,?labels))

????#?#############################################################################
????#?畫出結(jié)果

????#?黑色點(diǎn)代表噪聲點(diǎn)
????unique_labels?=?set(labels)
????colors?=?[
????????plt.cm.Spectral(each)?for?each?in?np.linspace(0,?1,?len(unique_labels))
????]
????for?k,?col?in?zip(unique_labels,?colors):
????????if?k?==?-1:
????????????#?Black?used?for?noise.
????????????col?=?[0,?0,?1,?1]

????????class_member_mask?=?labels?==?k

????????xy?=?X[class_member_mask?&?core_samples_mask]
????????plt.plot(
????????????xy[:,?0],
????????????xy[:,?1],
????????????"o",
????????????markerfacecolor=tuple(col),
????????????markeredgecolor="k",
????????????markersize=14,
????????)

????????xy?=?X[class_member_mask?&?~core_samples_mask]
????????plt.plot(
????????????xy[:,?0],
????????????xy[:,?1],
????????????"o",
????????????markerfacecolor=tuple(col),
????????????markeredgecolor="k",
????????????markersize=6,
????????)

????plt.title("簇的數(shù)量為:?%d"?%?n_clusters_,?fontsize=18)

#?????plt.savefig(str(MyEps)?+?str(MiniSample)?+?'.png')#保存圖片
????plt.show()

plot_dbscan(0.3,?10)

估計(jì)的簇的數(shù)量: 3
估計(jì)的噪聲點(diǎn)數(shù)量: 18
同一性(Homogeneity): 0.9530
完整性(Completeness): 0.8832
V-measure: 0.917
ARI(Adjusted Rand Index): 0.9517
AMI(Adjusted Mutual Information): 0.9165
輪廓系數(shù)(Silhouette Coefficient): 0.6255

plot_dbscan(0.1,?10)

估計(jì)的簇的數(shù)量: 12
估計(jì)的噪聲點(diǎn)數(shù)量: 516
同一性(Homogeneity): 0.3128
完整性(Completeness): 0.2489
V-measure: 0.277
ARI(Adjusted Rand Index): 0.0237
AMI(Adjusted Mutual Information): 0.2673
輪廓系數(shù)(Silhouette Coefficient): -0.3659

plot_dbscan(0.4,?10)

估計(jì)的簇的數(shù)量: 1
估計(jì)的噪聲點(diǎn)數(shù)量: 2
同一性(Homogeneity): 0.0010
完整性(Completeness): 0.0586
V-measure: 0.002
ARI(Adjusted Rand Index): -0.0000
AMI(Adjusted Mutual Information): -0.0011
輪廓系數(shù)(Silhouette Coefficient): 0.0611

plot_dbscan(0.3,?6)

估計(jì)的簇的數(shù)量: 2
估計(jì)的噪聲點(diǎn)數(shù)量: 13
同一性(Homogeneity): 0.5365
完整性(Completeness): 0.8263
V-measure: 0.651
ARI(Adjusted Rand Index): 0.5414
AMI(Adjusted Mutual Information): 0.6495
輪廓系數(shù)(Silhouette Coefficient): 0.3845

可以看到，當(dāng)掃描半徑 (eps)為0.3，同時(shí)最小包含點(diǎn)數(shù)(minPts)為10的時(shí)候，評(píng)價(jià)指標(biāo)最高。

層次聚類

import?numpy?as?np
import?matplotlib.pyplot?as?plt
plt.rcParams['font.sans-serif']=['SimHei']?#用來正常顯示中文標(biāo)簽
plt.rcParams['axes.unicode_minus']=False?#用來正常顯示負(fù)號(hào)
from?scipy.cluster.hierarchy?import?dendrogram
from?sklearn.datasets?import?load_iris
from?sklearn.cluster?import?AgglomerativeClustering

def?plot_dendrogram(model,?**kwargs):
????#?創(chuàng)建鏈接矩陣，然后繪制樹狀圖

????#?創(chuàng)建每個(gè)節(jié)點(diǎn)下的樣本計(jì)數(shù)
????counts?=?np.zeros(model.children_.shape[0])
????n_samples?=?len(model.labels_)
????for?i,?merge?in?enumerate(model.children_):
????????current_count?=?0
????????for?child_idx?in?merge:
????????????if?child_idx?????????????????current_count?+=?1??#?leaf?node
????????????else:
????????????????current_count?+=?counts[child_idx?-?n_samples]
????????counts[i]?=?current_count

????linkage_matrix?=?np.column_stack(
????????[model.children_,?model.distances_,?counts]
????).astype(float)

????#?繪制相應(yīng)的樹狀圖
????dendrogram(linkage_matrix,?**kwargs)

iris?=?load_iris()
X?=?iris.data

#?設(shè)置距離閾值=0可確保計(jì)算完整的樹。
model?=?AgglomerativeClustering(distance_threshold=0,?n_clusters=None)

model?=?model.fit(X)

plt.title("層次聚類樹狀圖")
#?繪制樹狀圖的前三級(jí)
plot_dendrogram(model,?truncate_mode="level",?p=3)
plt.xlabel("節(jié)點(diǎn)中的點(diǎn)數(shù)（如果沒有括號(hào)，則為點(diǎn)索引）")
plt.show()

參考

Prof. Andrew Ng. Machine Learning. Stanford University
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
https://scikit-learn.org/stable/auto_examples/cluster


往期精彩回顧




適合初學(xué)者入門人工智能的路線及資料下載
機(jī)器學(xué)習(xí)及深度學(xué)習(xí)筆記等資料打印
機(jī)器學(xué)習(xí)在線手冊(cè)
深度學(xué)習(xí)筆記專輯
《統(tǒng)計(jì)學(xué)習(xí)方法》的代碼復(fù)現(xiàn)專輯
AI基礎(chǔ)下載
黃海廣老師《機(jī)器學(xué)習(xí)課程》視頻課
黃海廣老師《機(jī)器學(xué)習(xí)課程》711頁完整版課件
本站qq群955171419，加入微信群請(qǐng)掃碼：

【機(jī)器學(xué)習(xí)】聚類代碼練習(xí)

K-means 聚類

k值的選擇

K-means圖像壓縮

密度聚類

層次聚類

參考