五月天激情社区,美女扒开超粉嫩的尿囗让人桶91,亚洲欧美午夜人兽,日韩午夜,欧美videos办公室丝袜长腿 ,欧美激情综合网,日本一道本高清在线一区二区 ,骚逼免费观看

點(diǎn)擊上方“小白學(xué)視覺(jué)”，選擇加"星標(biāo)"或“置頂”

重磅干貨，第一時(shí)間送達(dá)

時(shí)隔大半年，機(jī)器學(xué)習(xí)算法推導(dǎo)系列終于有時(shí)間繼續(xù)更新了。在之前的14講中，筆者將監(jiān)督模型中主要的單模型算法基本都過(guò)了一遍。預(yù)計(jì)在接下來(lái)的10講中，筆者將努力更新完以GBDT代表的集成學(xué)習(xí)模型，以EM算法、CRF和隱馬為代表的概率圖模型以及以聚類降維為代表的無(wú)監(jiān)督學(xué)習(xí)算法。

在系列第4和第5講，筆者集中對(duì)ID3和CART決策樹(shù)算法進(jìn)行了闡述，并給出二者算法的一些初步實(shí)現(xiàn)。本節(jié)我們來(lái)看集成學(xué)習(xí)的核心模型GBDT(Gradient Boosting Decision Tree)，即梯度提升決策樹(shù)，這也是一種決策樹(shù)模型算法。GBDT近年來(lái)在一些數(shù)據(jù)競(jìng)賽上大殺四方，并不斷衍生出像XGBoost和LightGBM等更強(qiáng)大的版本。從名字上看，GBDT是由決策樹(shù)、提升模型和梯度下降一起構(gòu)成的。所以，要搞清楚GBDT的基本原理，就必須對(duì)這三者及其相互作用有一個(gè)深入的理解。

GBDT基本原理

決策樹(shù)的基本原理我們已經(jīng)很清楚了，就是依據(jù)信息增益等原則不斷選擇特征構(gòu)建樹(shù)模型的過(guò)程，具體可參考數(shù)學(xué)推導(dǎo)+純Python實(shí)現(xiàn)機(jī)器學(xué)習(xí)算法5：決策樹(shù)之CART算法。Boosting則是一種集成學(xué)習(xí)模式，通過(guò)將多個(gè)單個(gè)決策樹(shù)(弱學(xué)習(xí)器)進(jìn)行線性組合構(gòu)成一個(gè)強(qiáng)學(xué)習(xí)器的過(guò)程，Boosting以一個(gè)單模型作為作為弱分類器，GBDT中使用CART作為這種弱學(xué)習(xí)器(基模型)。而融入了梯度下降對(duì)Boosting樹(shù)模型進(jìn)行優(yōu)化之后就有了梯度提升樹(shù)模型。

我們先來(lái)用一個(gè)通俗的說(shuō)法來(lái)理解GBDT。假設(shè)某位同學(xué)月薪10k，筆者先用一個(gè)樹(shù)模型擬合了6k，發(fā)現(xiàn)有4k的損失，然后再用一棵樹(shù)模型擬合了2k，這樣持續(xù)擬合下去，擬合值和目標(biāo)值之間的殘差會(huì)越來(lái)越小，而我們將每一輪迭代，也就是每一棵樹(shù)的預(yù)測(cè)值加起來(lái)就是模型最終的預(yù)測(cè)結(jié)果。不停的使用單棵決策樹(shù)組合就是Boosting的過(guò)程，使用梯度下降對(duì)Boosting樹(shù)模型進(jìn)行優(yōu)化的過(guò)程就是Gradient Boosting。

下面我們用數(shù)學(xué)語(yǔ)言來(lái)描述GBDT。

一個(gè)提升樹(shù)模型可以描述為：

在給定初始模型的情況下，第m步的模型可以表示為：

然后我們通過(guò)如下目標(biāo)函數(shù)來(lái)優(yōu)化下一棵樹(shù)的參數(shù)：

以回歸問(wèn)題的提升樹(shù)為例展開(kāi)，一棵回歸樹(shù)可表示為：

第0步、第m步和最終模型可表示為：

給定第m-1步的模型下，求解：

當(dāng)損失函數(shù)為平方損失時(shí)：

相應(yīng)的損失可推導(dǎo)為：

則有：

說(shuō)明提升樹(shù)模型每一次迭代是在擬合一個(gè)殘差函數(shù)。

但實(shí)際工作中并不是每一個(gè)損失函數(shù)都如平方損失那樣容易優(yōu)化，所以有學(xué)者就提出近似梯度下降的方法來(lái)使用損失函數(shù)的負(fù)梯度在當(dāng)前模型的值作為回歸提升樹(shù)中殘差的近似值，即：

所以，綜合提升樹(shù)和梯度提升，GBDT模型算法的一般流程可歸納為：

(1) 初始化弱學(xué)習(xí)器:

(2) 對(duì)有：

對(duì)每個(gè)樣本，計(jì)算負(fù)梯度，即殘差

將上步得到的殘差作為樣本新的真實(shí)值，并將數(shù)據(jù)作為下棵樹(shù)的訓(xùn)練數(shù)據(jù)，得到一顆新的回歸樹(shù)其對(duì)應(yīng)的葉子節(jié)點(diǎn)區(qū)域?yàn)?/span>。其中為回歸樹(shù)t的葉子節(jié)點(diǎn)的個(gè)數(shù)。
對(duì)葉子區(qū)域計(jì)算最佳擬合值

更新強(qiáng)學(xué)習(xí)器

(3) 得到最終學(xué)習(xí)器

GBDT代碼框架

手動(dòng)從頭開(kāi)始寫一個(gè)GBDT模型并非易事，需要我們對(duì)GBDT模型算法細(xì)節(jié)都有足夠深入的理解。在動(dòng)手寫代碼之前，我們需要梳理清楚代碼框架，一個(gè)完整的GBDT系統(tǒng)應(yīng)包括如下幾個(gè)方面，如圖所示。

GBDT的基模型為CART，所以定義決策樹(shù)結(jié)點(diǎn)和構(gòu)建CART樹(shù)至關(guān)重要，CART算法筆者系列第5講已經(jīng)進(jìn)行了初步實(shí)現(xiàn)。當(dāng)基模型構(gòu)建好后，即可根據(jù)GBDT算法流程搭建GBDT和GBRT。除此之外，一些輔助函數(shù)的定義(最大熵/Gini指數(shù)計(jì)算)，損失函數(shù)定義和模型可視化方法等輔助功能也應(yīng)該一應(yīng)俱全。

因樹(shù)結(jié)點(diǎn)和CART樹(shù)模型第5講已講過(guò)，具體實(shí)現(xiàn)方法這里不再重寫。

結(jié)點(diǎn)定義代碼框架：

class TreeNode():    def __init__(self, feature_i=None, threshold=None,                 value=None, true_branch=None, false_branch=None):         pass

樹(shù)定義代碼框架，主要包括樹(shù)的基本屬性和方法?；緦傩园ǜY(jié)點(diǎn)、最小劃分樣本數(shù)、最大深度和是否為葉子結(jié)點(diǎn)等等?；痉椒òQ策樹(shù)構(gòu)建、決策樹(shù)擬合、決策樹(shù)預(yù)測(cè)和打印等方法。

class Tree(object):    def __init__(self, min_samples_split=2, min_impurity=1e-7,                 max_depth=float("inf"), loss=None):        self.root = None  # Root node in dec. tree        # Minimum n of samples to justify split        self.min_samples_split = min_samples_split        # The minimum impurity to justify split        self.min_impurity = min_impurity        # The maximum depth to grow the tree to        self.max_depth = max_depth        # Function to calculate impurity (classif.=>info gain, regr=>variance reduct.)        # 切割樹(shù)的方法，gini，方差等        self._impurity_calculation = None        # Function to determine prediction of y at leaf        # 樹(shù)節(jié)點(diǎn)取值的方法，分類樹(shù)：選取出現(xiàn)最多次數(shù)的值，回歸樹(shù)：取所有值的平均值        self._leaf_value_calculation = None        # If y is one-hot encoded (multi-dim) or not (one-dim)        self.one_dim = None        # If Gradient Boost        self.loss = loss
    def fit(self, X, y, loss=None):        """ Build decision tree """        pass
    def _build_tree(self, X, y, current_depth=0):        """ Recursive method which builds out the decision tree and splits X and respective y        pass
    def predict_value(self, x, tree=None):        """ Do a recursive search down the tree and make a prediction of the data sample by the            value of the leaf that we end up at """        pass
    def predict(self, X):        """ Classify samples one by one and return the set of labels """        pass
    def print_tree(self, tree=None, indent=" "):        pass

以回歸樹(shù)為例，基于以上樹(shù)模型，可定義回歸樹(shù)模型如下：

class RegressionTree(Tree):    # 使用方差法進(jìn)行樹(shù)分割    def _calculate_variance_reduction(self, y, y1, y2):        var_tot = calculate_variance(y)        var_1 = calculate_variance(y1)        var_2 = calculate_variance(y2)        frac_1 = len(y1) / len(y)        frac_2 = len(y2) / len(y)        # Calculate the variance reduction        variance_reduction = var_tot - (frac_1 * var_1 + frac_2 * var_2)        return sum(variance_reduction)            # 使用均值法取葉子結(jié)點(diǎn)值    def _mean_of_y(self, y):        value = np.mean(y, axis=0)        return value if len(value) > 1 else value[0]            # 回歸樹(shù)擬合    def fit(self, X, y):        self._impurity_calculation = self._calculate_variance_reduction        self._leaf_value_calculation = self._mean_of_y        super(RegressionTree, self).fit(X, y)

在定義GBRT之前，先定義損失均方誤差損失函數(shù)：

class Loss(object):    def loss(self, y_true, y_pred):        return NotImplementedError()    def gradient(self, y, y_pred):        raise NotImplementedError()    def acc(self, y, y_pred):        return 0        class SquareLoss(Loss):    def __init__(self): pass    def loss(self, y, y_pred):        return 0.5 * np.power((y - y_pred), 2)    def gradient(self, y, y_pred):        return -(y - y_pred)

然后定義初始版本的GBDT模型：

class GBDT(object):    def __init__(self, n_estimators, learning_rate, min_samples_split,                 min_impurity, max_depth, regression):        # 基本參數(shù)        self.n_estimators = n_estimators        self.learning_rate = learning_rate        self.min_samples_split = min_samples_split        self.min_impurity = min_impurity        self.max_depth = max_depth        self.regression = regression        self.loss = SquareLoss()        if not self.regression:            self.loss = SotfMaxLoss()        # 分類問(wèn)題也可以使用回歸樹(shù)，利用殘差去學(xué)習(xí)概率        self.estimators = []        for i in range(self.n_estimators):            self.estimators.append(RegressionTree(min_samples_split=self.min_samples_split,                                             min_impurity=self.min_impurity,                                             max_depth=self.max_depth))    # 擬合方法    def fit(self, X, y):        # 讓第一棵樹(shù)去擬合模型        self.estimators[0].fit(X, y)        y_pred = self.estimators[0].predict(X)        for i in range(1, self.n_estimators):            gradient = self.loss.gradient(y, y_pred)            self.estimators[i].fit(X, gradient)            y_pred -= np.multiply(self.learning_rate, self.estimators[i].predict(X))    # 預(yù)測(cè)方法    def predict(self, X):        y_pred = self.estimators[0].predict(X)        for i in range(1, self.n_estimators):            y_pred -= np.multiply(self.learning_rate, self.estimators[i].predict(X))        if not self.regression:            # Turn into probability distribution            y_pred = np.exp(y_pred) / np.expand_dims(np.sum(np.exp(y_pred), axis=1), axis=1)            # Set label to the value that maximizes probability            y_pred = np.argmax(y_pred, axis=1)        return y_pred

然后可分別定義GBDT和GBRT：

# regression treeclass GBDTRegressor(GBDT):      def __init__(self, n_estimators=200, learning_rate=0.5, min_samples_split=2,                 min_var_red=1e-7, max_depth=4, debug=False):        super(GBDTRegressor, self).__init__(n_estimators=n_estimators,                                            learning_rate=learning_rate,                                            min_samples_split=min_samples_split,                                            min_impurity=min_var_red,                                            max_depth=max_depth,                                            regression=True)# classification treeclass GBDTClassifier(GBDT):      def __init__(self, n_estimators=200, learning_rate=.5, min_samples_split=2,                 min_info_gain=1e-7, max_depth=2, debug=False):         super(GBDTClassifier, self).__init__(n_estimators=n_estimators,                                             learning_rate=learning_rate,                                             min_samples_split=min_samples_split,                                             min_impurity=min_info_gain,                                             max_depth=max_depth,                                             regression=False)      def fit(self, X, y):        y = to_categorical(y)        super(GBDTClassifier, self).fit(X, y)

最后基于boston房?jī)r(jià)數(shù)據(jù)集給出一個(gè)計(jì)算例子：

from sklearn import datasetsboston = datasets.load_boston()X, y = shuffle_data(boston.data, boston.target, seed=13)X = X.astype(np.float32)offset = int(X.shape[0] * 0.9)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
model = GBDTRegressor()model.fit(X_train, y_train)y_pred = model.predict(X_test)# Color mapcmap = plt.get_cmap('viridis')mse = mean_squared_error(y_test, y_pred)print ("Mean Squared Error:", mse)
# Plot the resultsm1 = plt.scatter(range(X_test.shape[0]), y_test, color=cmap(0.5), s=10)m2 = plt.scatter(range(X_test.shape[0]), y_pred, color='black', s=10)plt.suptitle("Regression Tree")plt.title("MSE: %.2f" % mse, fontsize=10)plt.xlabel('sample')plt.ylabel('house price')plt.legend((m1, m2), ("Test data", "Prediction"), loc='lower right')plt.show();

slearn中為我們提供了GBDT算法完整的API可供調(diào)用，實(shí)際工程中更不可能自己手寫這么復(fù)雜的算法系統(tǒng)。但作為學(xué)習(xí)，手寫算法不失為一種深入理解算法細(xì)節(jié)和鍛煉代碼能力的好方法。

完整代碼可參考：

https://github.com/RRdmlearning/Machine-Learning-From-Scratch/blob/master/gradient_boosting_decision_tree

好消息！
小白學(xué)視覺(jué)知識(shí)星球
開(kāi)始面向外開(kāi)放啦??????



下載1：OpenCV-Contrib擴(kuò)展模塊中文版教程
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：擴(kuò)展模塊中文教程，即可下載全網(wǎng)第一份OpenCV擴(kuò)展模塊教程中文版，涵蓋擴(kuò)展模塊安裝、SFM算法、立體視覺(jué)、目標(biāo)跟蹤、生物視覺(jué)、超分辨率處理等二十多章內(nèi)容。

下載2：Python視覺(jué)實(shí)戰(zhàn)項(xiàng)目52講
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：Python視覺(jué)實(shí)戰(zhàn)項(xiàng)目，即可下載包括圖像分割、口罩檢測(cè)、車道線檢測(cè)、車輛計(jì)數(shù)、添加眼線、車牌識(shí)別、字符識(shí)別、情緒檢測(cè)、文本內(nèi)容提取、面部識(shí)別等31個(gè)視覺(jué)實(shí)戰(zhàn)項(xiàng)目，助力快速學(xué)校計(jì)算機(jī)視覺(jué)。

下載3：OpenCV實(shí)戰(zhàn)項(xiàng)目20講
在「小白學(xué)視覺(jué)」公眾號(hào)后臺(tái)回復(fù)：OpenCV實(shí)戰(zhàn)項(xiàng)目20講，即可下載含有20個(gè)基于OpenCV實(shí)現(xiàn)20個(gè)實(shí)戰(zhàn)項(xiàng)目，實(shí)現(xiàn)OpenCV學(xué)習(xí)進(jìn)階。

交流群

歡迎加入公眾號(hào)讀者群一起和同行交流，目前有SLAM、三維視覺(jué)、傳感器、自動(dòng)駕駛、計(jì)算攝影、檢測(cè)、分割、識(shí)別、醫(yī)學(xué)影像、GAN、算法競(jìng)賽等微信群（以后會(huì)逐漸細(xì)分），請(qǐng)掃描下面微信號(hào)加群，備注：”昵稱+學(xué)校/公司+研究方向“，例如：”張三 + 上海交大 + 視覺(jué)SLAM“。請(qǐng)按照格式備注，否則不予通過(guò)。添加成功后會(huì)根據(jù)研究方向邀請(qǐng)進(jìn)入相關(guān)微信群。請(qǐng)勿在群內(nèi)發(fā)送廣告，否則會(huì)請(qǐng)出群，謝謝理解~

數(shù)學(xué)推導(dǎo)+純Python實(shí)現(xiàn)機(jī)器學(xué)習(xí)算法15：GBDT