天天干狠狠操,A级黄片在线视频,人妻AV在线观看,特级西西444kkk高清视频,操操AV影音,影音先锋女人av鲁色资源网小说,逼特逼91密桃视频,久久免费高清

↑↑↑關(guān)注后"星標"Datawhale

?Datawhale干貨?

來源：Jason Brownlee，整理：數(shù)據(jù)派THU

本文約3300字，建議閱讀10分鐘

本文介紹了如何用XGBoost做時間序列預(yù)測，包括將時間序列轉(zhuǎn)化為有監(jiān)督學(xué)習(xí)的預(yù)測問題，使用前向驗證來做模型評估，并給出了可操作的代碼示例。

針對分類和回歸問題，XGBoost是梯度提升算法的一種高效實現(xiàn)。

它兼顧了速度和效率，且在很多預(yù)測模型任務(wù)中表現(xiàn)優(yōu)異，在數(shù)據(jù)科學(xué)比賽中廣受贏家偏愛，如Kaggle。

XGBoost也可以用于時間序列預(yù)測，盡管要先把時間序列數(shù)據(jù)集轉(zhuǎn)換成適用于有監(jiān)督學(xué)習(xí)的形式。它還需要使用一種專門的技術(shù)來評估模型，稱為前向推進驗證，因為模型評估使用了k-折疊交叉，這會產(chǎn)生有正偏差的結(jié)果。

在本文中，你將會了解到如何開發(fā)應(yīng)用于時間序列預(yù)測的XGBoost模型。

完成本教程后，你將知道：

XGBoost是用于分類和回歸問題的梯度提升集成方法的一個實現(xiàn)。
通過使用滑動時間窗口表示，時間序列數(shù)據(jù)集可以適用于有監(jiān)督學(xué)習(xí)。
在時間序列預(yù)測問題上，如何使用XGBoost模型進行擬合、評估、預(yù)測。

讓我們開始吧！

教程概覽

本教程分為三個部分，分別是：

一、XGBoost集成

二、時間序列數(shù)據(jù)準備

三、時間序列預(yù)測上的XGBoost

一、XGBoost集成

XGBoost是Extreme GradientBoosting的縮寫，是一種高效的隨機梯度提升的實現(xiàn)。

隨機梯度提升算法（或者叫g(shù)radient boosting machines ortree boosting）是一種強大的機器學(xué)習(xí)技術(shù)，在很多有挑戰(zhàn)的機器學(xué)習(xí)問題上，表現(xiàn)的非常好甚至是最好。

Tree boosting has been shown to give state-of-the-art results onmany standard classification benchmarks.

—?XGBoost:A Scalable Tree Boosting System, 2016.

https://arxiv.org/abs/1603.02754

它是一個決策樹算法的集成，其中新樹可以對模型中已有樹的結(jié)果進行修正。我們可以不斷增加決策樹，直到達到滿意的效果。

XGBoost是隨機梯度提升算法的一種高效實現(xiàn)，它可以通過一系列模型超參數(shù)在整個訓(xùn)練過程中控制模型。

The mostimportant factor behind the success of XGBoost is its scalability in allscenarios. The system runs more than ten times faster than existing popularsolutions on a single machine and scales to billions of examples in distributedor memory-limited settings.

—?XGBoost: A Scalable TreeBoosting System, 2016.

https://arxiv.org/abs/1603.02754

XGBoost是為表格式數(shù)據(jù)集的分類和回歸問題而設(shè)計的，也可以用于時間序列預(yù)測。

想獲得更多有關(guān)GDBT和XGBoost實現(xiàn)，請看以下教程：

《機器學(xué)習(xí)中梯度提升算法的簡要概括》

鏈接：https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

首先，XGBoost需要安裝，你可以用pip安裝，如下：

安裝后，可以通過以下代碼確認是否成功安裝以及安裝的版本：

執(zhí)行以上代碼，會看到如下的版本號，也有可能版本號更高：

雖然XGBoost庫有自己的python接口，你也可以使用scikit-learn API中的XGBRegressor包裝類。

模型的一個實例可以被實例化并像任何其他scikit-learn類一樣用于模型評估。例如：

現(xiàn)在我們已經(jīng)熟悉了XGBoost，接下來我們看一看如何準備用于監(jiān)督學(xué)習(xí)的時間序列數(shù)據(jù)集。

二、時間序列數(shù)據(jù)準備

時間數(shù)據(jù)可以用于監(jiān)督學(xué)習(xí)。

給定時間序列數(shù)據(jù)集的一系列數(shù)字，我們可以重新構(gòu)造數(shù)據(jù)，使其看起來像一個有監(jiān)督的學(xué)習(xí)問題。我們可以使用前一個時間步長的數(shù)據(jù)作為輸入變量，并使用下一個時間步長作為輸出變量。

讓我們用一個例子來具體學(xué)習(xí)。設(shè)想我們有這樣一組時間序列數(shù)據(jù)：

我們可以把這個時間序列數(shù)據(jù)集重新構(gòu)造成一個有監(jiān)督學(xué)習(xí)，用前一個時間步長的值來預(yù)測下一個時間步的值。

通過這種方式重新組織時間序列數(shù)據(jù)集，數(shù)據(jù)將如下所示：

注意！我們?nèi)サ袅藭r間列，并且有幾行數(shù)據(jù)不能用于訓(xùn)練，如第一行和最后一行。

這種表示稱為滑動窗口，因為輸入和期望輸出的窗口隨著時間向前移動，為有監(jiān)督學(xué)習(xí)模型創(chuàng)建新的“樣本”。

有關(guān)準備時間序列預(yù)測數(shù)據(jù)的滑動窗口方法的更多信息，請參閱教程：

《Time Series Forecasting as Supervised Learning》

鏈接：https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

可以用pandas庫的shift()方法，按照給定的輸入輸出的長度，把時間序列數(shù)據(jù)轉(zhuǎn)換為新框架。

這將是一個有用的工具，因為它可以讓我們用機器學(xué)習(xí)算法來探索時間序列問題的不同框架，看看哪種方法可能會產(chǎn)生更好的模型。

下面的函數(shù)將時間序列作為具有一列或多列的NumPy數(shù)組時間序列，并將其轉(zhuǎn)換為具有指定數(shù)量的輸入和輸出的監(jiān)督學(xué)習(xí)問題。

我們可以使用此函數(shù)為XGBoost準備一個時間序列數(shù)據(jù)集。?????????????

有關(guān)此功能逐步開發(fā)的更多信息，請參閱教程：

《如何在Python中將時間序列轉(zhuǎn)化為監(jiān)督學(xué)習(xí)問題》

鏈接：https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

數(shù)據(jù)集準備好之后，我們需要關(guān)注如何使用它來擬合和評估一個模型。

比如用未來數(shù)據(jù)預(yù)測歷史數(shù)據(jù)的模型是無效的。模型必須根據(jù)歷史數(shù)據(jù)預(yù)測未來。

這意味著模型評估階段，類似k折交叉檢驗這種數(shù)據(jù)集隨機拆分的方法并不適用。相反我們必須使用一種稱為向前推進驗證的技術(shù)。

在前向驗證中，首先通過選擇一個拆分點將數(shù)據(jù)分為訓(xùn)練集和測試集，比如除去最后12個月的數(shù)據(jù)用于訓(xùn)練，最后12個月的數(shù)據(jù)用于測試。

如果對一步預(yù)測感興趣，例如一個月，那么我們可以通過在訓(xùn)練數(shù)據(jù)集上訓(xùn)練并預(yù)測測試數(shù)據(jù)集中的第一個步長來評估模型。然后，我們可以將來自測試集的真實觀測值添加到訓(xùn)練數(shù)據(jù)集中，重新調(diào)整模型，然后讓模型預(yù)測測試數(shù)據(jù)集中的第二個步長。

在整個測試集上重復(fù)這個過程，可以得到一步長的預(yù)測，并且可以計算錯誤率來評估這個模型的表現(xiàn)。

有關(guān)前向驗證的更多信息，請參考教程：

《How To Backtest Machine Learning Models for Time Series Forecasting》

鏈接：https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/）

下邊這個函數(shù)運行前向驗證。

參數(shù)是整個時間序列數(shù)據(jù)集和用于測試集的行數(shù)。

然后它遍歷測試集，調(diào)用xgboost_forecast()函數(shù)做一步長的預(yù)測。計算錯誤度量并返回詳細信息以供分析。

train_test_split()函數(shù)是用來把數(shù)據(jù)集劃分為訓(xùn)練集和測試集的。可以如下定義這個方法：

可以用XGBRegressor類來做一步預(yù)測。xgboost_forecast()方法實現(xiàn)的是，以訓(xùn)練集、測試集的輸入作為函數(shù)的輸入，擬合模型，然后做一步長預(yù)測。

現(xiàn)在我們已經(jīng)知道如何準備用于預(yù)測的時間序列數(shù)據(jù)集，以及評估XGBoost模型，接下來我們可以在實際的數(shù)據(jù)集上使用XGBoost。

三、XGBoost用于時間序列預(yù)測

在本節(jié)中，我們將探討如何使用XGBoost進行時間序列預(yù)測。???

?????????

我們將使用一個標準的單變量時間序列數(shù)據(jù)集，目的是使用該模型進行一步預(yù)測。

你可以使用本節(jié)的代碼來開始自己項目，它可以輕易的轉(zhuǎn)化應(yīng)用于多變量輸入、多變量預(yù)測、多步長預(yù)測。

以下鏈接可以用于下載數(shù)據(jù)集，在本地工作目錄以“daily-total-female-births.csv“的文件名導(dǎo)入。

Dataset (daily-total-female-births.csv)
鏈接：https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv
Description (daily-total-female-births.names)
鏈接：https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names

數(shù)據(jù)集的前幾行如下所示：

首先導(dǎo)入數(shù)據(jù)，繪制數(shù)據(jù)集。完整的示例如下：

運行這段示例可以得到這個數(shù)據(jù)集的折線圖?？梢园l(fā)現(xiàn)沒有明顯的趨勢和季節(jié)性。

在預(yù)測后12個月的嬰兒出生數(shù)的問題上，持續(xù)性模型實現(xiàn)了6.7的平均絕對誤差（MAE）。提供了一個模型有效的基準。

接下來我們評估XGBoost模型在這個數(shù)據(jù)集上的表現(xiàn)，并對最后12個月的數(shù)據(jù)做一步長的預(yù)測。

我們僅使用前三個時間步長作為模型輸入，以及默認的模型超參數(shù)，但是把loss改成了‘reg:squarederror‘（以避免警告消息）并在集合中使用1000棵樹（以避免欠擬合）。

完整的示例如下：

# forecast monthlybirths with xgboostfrom numpy importasarrayfrom pandas importread_csvfrom pandas importDataFramefrom pandas importconcatfrom sklearn.metricsimport mean_absolute_errorfrom xgboost importXGBRegressorfrom matplotlib importpyplot # transform a timeseries dataset into a supervised learning datasetdefseries_to_supervised(data, n_in=1, n_out=1, dropnan=True):       n_vars = 1 if type(data) is list elsedata.shape[1]       df = DataFrame(data)       cols = list()       # input sequence (t-n, ... t-1)       for i in range(n_in, 0, -1):              cols.append(df.shift(i))       # forecast sequence (t, t+1, ... t+n)       for i in range(0, n_out):              cols.append(df.shift(-i))       # put it all together       agg = concat(cols, axis=1)       # drop rows with NaN values       if dropnan:              agg.dropna(inplace=True)       return agg.values # split a univariatedataset into train/test setsdeftrain_test_split(data, n_test):       return data[:-n_test, :], data[-n_test:,:] # fit an xgboost modeland make a one step predictiondef xgboost_forecast(train,testX):       # transform list into array       train = asarray(train)       # split into input and output columns       trainX, trainy = train[:, :-1], train[:,-1]       # fit model       model =XGBRegressor(objective='reg:squarederror', n_estimators=1000)       model.fit(trainX, trainy)       # make a one-step prediction       yhat = model.predict(asarray([testX]))       return yhat[0] # walk-forwardvalidation for univariate datadefwalk_forward_validation(data, n_test):       predictions = list()       # split dataset       train, test = train_test_split(data,n_test)       # seed history with training dataset       history = [x for x in train]       # step over each time-step in the testset       for i in range(len(test)):              # split test row into input andoutput columns              testX, testy = test[i, :-1],test[i, -1]              # fit model on history and make aprediction              yhat = xgboost_forecast(history,testX)              # store forecast in list ofpredictions              predictions.append(yhat)              # add actual observation tohistory for the next loop              history.append(test[i])              # summarize progress              print('>expected=%.1f,predicted=%.1f' % (testy, yhat))       # estimate prediction error       error = mean_absolute_error(test[:, 1],predictions)       return error, test[:, 1], predictions # load the datasetseries =read_csv('daily-total-female-births.csv', header=0, index_col=0)values = series.values# transform the timeseries data into supervised learningdata =series_to_supervised(values, n_in=3)# evaluatemae, y, yhat =walk_forward_validation(data, 12)print('MAE: %.3f' %mae)# plot expected vspreductedpyplot.plot(y,label='Expected')pyplot.plot(yhat,label='Predicted')pyplot.legend()pyplot.show()

運行這個示例將報告測試集中每個時間的預(yù)期值和預(yù)測值，然后報告所有預(yù)測值的MAE。?????????????

我們可以看到，該模型比6.7MAE的持久性模型表現(xiàn)得更好，實現(xiàn)了大約5.3個出生嬰兒的MAE。

你可以做的更好嗎？

可以嘗試不同的XGBoost超參數(shù)，以及不同的時間步長的輸入，看看是否能夠得到更好的模型，歡迎在評論區(qū)中分享結(jié)果。

下圖繪制了用于比較最后12個月的預(yù)測值和實際值的折線圖，該圖提供了一個測試集上模型表現(xiàn)情況的可視化展示。

一旦選擇了最終的XGBoost模型參數(shù)，就可以確定一個模型并用于對新數(shù)據(jù)進行預(yù)測。?????????????

這稱為樣本外預(yù)測，例如訓(xùn)練集之外的預(yù)測。這與在評估模型期間進行預(yù)測是相同的：因為在評估選擇哪個模型和用這個模型在新數(shù)據(jù)上做預(yù)測的流程是一樣的。

下面的示例演示如何在所有可用數(shù)據(jù)上擬合最終的XGBoost模型，并在數(shù)據(jù)集末尾之外進行一步預(yù)測。

# finalize model andmake a prediction for monthly births with xgboostfrom numpy importasarrayfrom pandas importread_csvfrom pandas importDataFramefrom pandas importconcatfrom xgboost importXGBRegressor # transform a timeseries dataset into a supervised learning datasetdefseries_to_supervised(data, n_in=1, n_out=1, dropnan=True):       n_vars = 1 if type(data) is list elsedata.shape[1]       df = DataFrame(data)       cols = list()       # input sequence (t-n, ... t-1)       for i in range(n_in, 0, -1):              cols.append(df.shift(i))       # forecast sequence (t, t+1, ... t+n)       for i in range(0, n_out):              cols.append(df.shift(-i))       # put it all together       agg = concat(cols, axis=1)       # drop rows with NaN values       if dropnan:              agg.dropna(inplace=True)       return agg.values # load the datasetseries =read_csv('daily-total-female-births.csv', header=0, index_col=0)values = series.values# transform the timeseries data into supervised learningtrain =series_to_supervised(values, n_in=3)# split into input andoutput columnstrainX, trainy =train[:, :-1], train[:, -1]# fit modelmodel =XGBRegressor(objective='reg:squarederror', n_estimators=1000)model.fit(trainX,trainy)# construct an inputfor a new preductionrow = values[-3:].flatten()# make a one-steppredictionyhat =model.predict(asarray([row]))print('Input: %s,Predicted: %.3f' % (row, yhat[0]))

運行該代碼，基于所有可用數(shù)據(jù)構(gòu)建XGBoost模型。

???

使用最后三個月的已知數(shù)據(jù)作為新的輸入行，并預(yù)測數(shù)據(jù)集結(jié)束后的下一個月。

進一步閱讀

如果您想深入了解，本節(jié)將提供有關(guān)該主題的更多資源。?????????????

相關(guān)教程

機器學(xué)習(xí)中梯度提升算法的簡要介紹
https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/
時間序列預(yù)測轉(zhuǎn)化為監(jiān)督學(xué)習(xí)問題
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
如何用Python 將時間序列問題轉(zhuǎn)化為有監(jiān)督學(xué)習(xí)問題
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
How To Backtest Machine Learning Models for Time Series ? ? Forecasting
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/

總結(jié)

在本教程中，您了解了如何為時間序列預(yù)測開發(fā)XGBoost模型。

具體來說，你學(xué)到了：?

XGBoost是用于分類和回歸的梯度boosting集成算法的實現(xiàn)
時間序列數(shù)據(jù)集可以通過滑動窗口表示轉(zhuǎn)化為有監(jiān)督學(xué)習(xí)。?????????????
如何使用XGBoost模型擬合、評估和預(yù)測時間序列預(yù)測。

原文標題：

How to Use XGBoost for Time Series Forecasting

原文鏈接：

https://machinelearningmastery.com/xgboost-for-time-series-forecasting/

“干貨學(xué)習(xí)，點贊三連↓

如何用XGBoost做時間序列預(yù)測？

如何用XGBoost做時間序列預(yù)測？