绿奴国产区一区二区三区,啪啪免费视频,欧美美女操B视频,天天插夜夜操,黄片无码免费,男人天堂色,狠狠狠的操,国产黄色视频免费观看

導讀：本文介紹了如何用XGBoost做時間序列預(yù)測，包括將時間序列轉(zhuǎn)化為有監(jiān)督學習的預(yù)測問題，使用前向驗證來做模型評估，并給出了可操作的代碼示例。

作者：Jason Brownlee

翻譯：wwl

來源：數(shù)據(jù)派THU（ID：DatapiTHU）

針對分類和回歸問題，XGBoost是梯度提升算法的一種高效實現(xiàn)。?它兼顧了速度和效率，且在很多預(yù)測模型任務(wù)中表現(xiàn)優(yōu)異，在數(shù)據(jù)科學比賽中廣受贏家偏愛，如Kaggle。

XGBoost也可以用于時間序列預(yù)測，盡管要先把時間序列數(shù)據(jù)集轉(zhuǎn)換成適用于有監(jiān)督學習的形式。它還需要使用一種專門的技術(shù)來評估模型，稱為前向推進驗證，因為模型評估使用了k-折疊交叉，這會產(chǎn)生有正偏差的結(jié)果。

在本文中，你將會了解到如何開發(fā)應(yīng)用于時間序列預(yù)測的XGBoost模型。

完成本教程后，你將知道：

XGBoost是用于分類和回歸問題的梯度提升集成方法的一個實現(xiàn)。
通過使用滑動時間窗口表示，時間序列數(shù)據(jù)集可以適用于有監(jiān)督學習。
在時間序列預(yù)測問題上，如何使用XGBoost模型進行擬合、評估、預(yù)測。

讓我們開始吧！

本教程分為三個部分，分別是：

01 XGBoost集成

02 時間序列數(shù)據(jù)準備

03 時間序列預(yù)測上的XGBoost

01 XGBoost集成

XGBoost是Extreme Gradient Boosting的縮寫，是一種高效的隨機梯度提升的實現(xiàn)。

隨機梯度提升算法（或者叫g(shù)radient boosting machines ortree boosting）是一種強大的機器學習技術(shù)，在很多有挑戰(zhàn)的機器學習問題上，表現(xiàn)的非常好甚至是最好。

Tree boosting has been shown to give state-of-the-art results onmany standard classification benchmarks.

—?XGBoost:A Scalable Tree Boosting System, 2016.
https://arxiv.org/abs/1603.02754

它是一個決策樹算法的集成，其中新樹可以對模型中已有樹的結(jié)果進行修正。我們可以不斷增加決策樹，直到達到滿意的效果。

XGBoost是隨機梯度提升算法的一種高效實現(xiàn)，它可以通過一系列模型超參數(shù)在整個訓練過程中控制模型。

The mostimportant factor behind the success of XGBoost is its scalability in allscenarios. The system runs more than ten times faster than existing popularsolutions on a single machine and scales to billions of examples in distributedor memory-limited settings.

—?XGBoost: A Scalable TreeBoosting System, 2016.
https://arxiv.org/abs/1603.02754

XGBoost是為表格式數(shù)據(jù)集的分類和回歸問題而設(shè)計的，也可以用于時間序列預(yù)測。

想獲得更多有關(guān)GDBT和XGBoost實現(xiàn)，請看以下教程：

《機器學習中梯度提升算法的簡要概括》
鏈接：
https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

首先，XGBoost需要安裝，你可以用pip安裝，如下：

sudo?pip?install?xgboost

安裝后，可以通過以下代碼確認是否成功安裝以及安裝的版本：

#?xgboost
import?xgboost
print("xgboost",?xgboost.__version__)

執(zhí)行以上代碼，會看到如下的版本號，也有可能版本號更高：

xgboost?1.0.1

雖然XGBoost庫有自己的python接口，你也可以使用scikit-learn API中的XGBRegressor包裝類。

模型的一個實例可以被實例化并像任何其他scikit-learn類一樣用于模型評估。例如：

...
#?define?model
model?=?XGBRegressor()

現(xiàn)在我們已經(jīng)熟悉了XGBoost，接下來我們看一看如何準備用于監(jiān)督學習的時間序列數(shù)據(jù)集。

02 時間序列數(shù)據(jù)準備

時間數(shù)據(jù)可以用于監(jiān)督學習。

給定時間序列數(shù)據(jù)集的一系列數(shù)字，我們可以重新構(gòu)造數(shù)據(jù)，使其看起來像一個有監(jiān)督的學習問題。我們可以使用前一個時間步長的數(shù)據(jù)作為輸入變量，并使用下一個時間步長作為輸出變量。

讓我們用一個例子來具體學習。設(shè)想我們有這樣一組時間序列數(shù)據(jù)：

time,?measure
1,?100
2,?110
3,?108
4,?115
5,?120

我們可以把這個時間序列數(shù)據(jù)集重新構(gòu)造成一個有監(jiān)督學習，用前一個時間步長的值來預(yù)測下一個時間步的值。

通過這種方式重新組織時間序列數(shù)據(jù)集，數(shù)據(jù)將如下所示：

X,?y
?,?100
100,?110
110,?108
108,?115
115,?120
120,??

注意！我們?nèi)サ袅藭r間列，并且有幾行數(shù)據(jù)不能用于訓練，如第一行和最后一行。

這種表示稱為滑動窗口，因為輸入和期望輸出的窗口隨著時間向前移動，為有監(jiān)督學習模型創(chuàng)建新的“樣本”。

有關(guān)準備時間序列預(yù)測數(shù)據(jù)的滑動窗口方法的更多信息，請參閱教程：

《Time Series Forecasting as Supervised Learning》
鏈接：
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

可以用pandas庫的shift()方法，按照給定的輸入輸出的長度，把時間序列數(shù)據(jù)轉(zhuǎn)換為新框架。

這將是一個有用的工具，因為它可以讓我們用機器學習算法來探索時間序列問題的不同框架，看看哪種方法可能會產(chǎn)生更好的模型。

下面的函數(shù)將時間序列作為具有一列或多列的NumPy數(shù)組時間序列，并將其轉(zhuǎn)換為具有指定數(shù)量的輸入和輸出的監(jiān)督學習問題。

#?transform?a?time?series?dataset?into?a?supervised?learning?dataset
def?series_to_supervised(data,?n_in=1,?n_out=1,?dropnan=True):
????n_vars?=?1?if?type(data)?is?list?else?data.shape[1]
????df?=?DataFrame(data)
????cols?=?list()
????#?input?sequence?(t-n,?...?t-1)
????for?i?in?range(n_in,?0,?-1):
????????cols.append(df.shift(i))
????#?forecast?sequence?(t,?t+1,?...?t+n)
????for?i?in?range(0,?n_out):
????????cols.append(df.shift(-i))
????#?put?it?all?together
????agg?=?concat(cols,?axis=1)
????#?drop?rows?with?NaN?values
????if?dropnan:
????????agg.dropna(inplace=True)
????return?agg.values

我們可以使用此函數(shù)為XGBoost準備一個時間序列數(shù)據(jù)集。

有關(guān)此功能逐步開發(fā)的更多信息，請參閱教程：

《如何在Python中將時間序列轉(zhuǎn)化為監(jiān)督學習問題》
鏈接：
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

數(shù)據(jù)集準備好之后，我們需要關(guān)注如何使用它來擬合和評估一個模型。

比如用未來數(shù)據(jù)預(yù)測歷史數(shù)據(jù)的模型是無效的。模型必須根據(jù)歷史數(shù)據(jù)預(yù)測未來。

這意味著模型評估階段，類似k折交叉檢驗這種數(shù)據(jù)集隨機拆分的方法并不適用。相反我們必須使用一種稱為向前推進驗證的技術(shù)。

在前向驗證中，首先通過選擇一個拆分點將數(shù)據(jù)分為訓練集和測試集，比如除去最后12個月的數(shù)據(jù)用于訓練，最后12個月的數(shù)據(jù)用于測試。

如果對一步預(yù)測感興趣，例如一個月，那么我們可以通過在訓練數(shù)據(jù)集上訓練并預(yù)測測試數(shù)據(jù)集中的第一個步長來評估模型。然后，我們可以將來自測試集的真實觀測值添加到訓練數(shù)據(jù)集中，重新調(diào)整模型，然后讓模型預(yù)測測試數(shù)據(jù)集中的第二個步長。

在整個測試集上重復這個過程，可以得到一步長的預(yù)測，并且可以計算錯誤率來評估這個模型的表現(xiàn)。

有關(guān)前向驗證的更多信息，請參考教程：

《How To Backtest Machine Learning Models for Time Series Forecasting》
鏈接：
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/

下邊這個函數(shù)運行前向驗證。

參數(shù)是整個時間序列數(shù)據(jù)集和用于測試集的行數(shù)。

然后它遍歷測試集，調(diào)用xgboost_forecast()函數(shù)做一步長的預(yù)測。計算錯誤度量并返回詳細信息以供分析。?

#?walk-forward?validation?for?univariate?data
def?walk_forward_validation(data,?n_test):
????predictions?=?list()
????#?split?dataset
????train,?test?=?train_test_split(data,?n_test)
????#?seed?history?with?training?dataset
????history?=?[x?for?x?in?train]
????#?step?over?each?time-step?in?the?test?set
????for?i?in?range(len(test)):
????????#?split?test?row?into?input?and?output?columns
????????testX,?testy?=?test[i,?:-1],?test[i,?-1]
????????#?fit?model?on?history?and?make?a?prediction
????????yhat?=?xgboost_forecast(history,?testX)
????????#?store?forecast?in?list?of?predictions
????????predictions.append(yhat)
????????#?add?actual?observation?to?history?for?the?next?loop
????????history.append(test[i])
????????#?summarize?progress
????????print('>expected=%.1f,?predicted=%.1f'?%?(testy,?yhat))
????#?estimate?prediction?error
????error?=?mean_absolute_error(test[:,?-1],?predictions)
????return?error,?test[:,?1],?predictions

train_test_split()函數(shù)是用來把數(shù)據(jù)集劃分為訓練集和測試集的?？梢匀缦露x這個方法：

#?split?a?univariate?dataset?into?train/test?sets
def?train_test_split(data,?n_test):
????return?data[:-n_test,?:],?data[-n_test:,?:]

可以用XGBRegressor類來做一步預(yù)測。xgboost_forecast()方法實現(xiàn)的是，以訓練集、測試集的輸入作為函數(shù)的輸入，擬合模型，然后做一步長預(yù)測。

#?fit?an?xgboost?model?and?make?a?one?step?prediction
def?xgboost_forecast(train,?testX):
????#?transform?list?into?array
????train?=?asarray(train)
????#?split?into?input?and?output?columns
????trainX,?trainy?=?train[:,?:-1],?train[:,?-1]
????#?fit?model
????model?=?XGBRegressor(objective='reg:squarederror',?n_estimators=1000)
????model.fit(trainX,?trainy)
????#?make?a?one-step?prediction
????yhat?=?model.predict([testX])
????return?yhat[0]

現(xiàn)在我們已經(jīng)知道如何準備用于預(yù)測的時間序列數(shù)據(jù)集，以及評估XGBoost模型，接下來我們可以在實際的數(shù)據(jù)集上使用XGBoost。

03 XGBoost用于時間序列預(yù)測

在本節(jié)中，我們將探討如何使用XGBoost進行時間序列預(yù)測。

我們將使用一個標準的單變量時間序列數(shù)據(jù)集，目的是使用該模型進行一步預(yù)測。

你可以使用本節(jié)的代碼來開始自己項目，它可以輕易的轉(zhuǎn)化應(yīng)用于多變量輸入、多變量預(yù)測、多步長預(yù)測。

以下鏈接可以用于下載數(shù)據(jù)集，在本地工作目錄以“daily-total-female-births.csv“的文件名導入。

Dataset (daily-total-female-births.csv)
鏈接：
https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv

Description (daily-total-female-births.names)
鏈接：
https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names

數(shù)據(jù)集的前幾行如下所示：

"Date","Births"
"1959-01-01",35
"1959-01-02",32
"1959-01-03",30
"1959-01-04",31
"1959-01-05",44
...

首先導入數(shù)據(jù)，繪制數(shù)據(jù)集。完整的示例如下：

#?load?and?plot?the?time?series?dataset
from?pandas?import?read_csv
from?matplotlib?import?pyplot
#?load?dataset
series?=?read_csv('daily-total-female-births.csv',?header=0,?index_col=0)
values?=?series.values
#?plot?dataset
pyplot.plot(values)
pyplot.show()

運行這段示例可以得到這個數(shù)據(jù)集的折線圖?？梢园l(fā)現(xiàn)沒有明顯的趨勢和季節(jié)性。?

在預(yù)測后12個月的嬰兒出生數(shù)的問題上，持續(xù)性模型實現(xiàn)了6.7的平均絕對誤差（MAE）。提供了一個模型有效的基準。

接下來我們評估XGBoost模型在這個數(shù)據(jù)集上的表現(xiàn)，并對最后12個月的數(shù)據(jù)做一步長的預(yù)測。

我們僅使用前三個時間步長作為模型輸入，以及默認的模型超參數(shù)，但是把loss改成了‘reg:squarederror‘（以避免警告消息）并在集合中使用1000棵樹（以避免欠擬合）。

完整的示例如下：

#?forecast?monthly?births?with?xgboost
from?numpy?import?asarray
from?pandas?import?read_csv
from?pandas?import?DataFrame
from?pandas?import?concat
from?sklearn.metrics?import?mean_absolute_error
from?xgboost?import?XGBRegressor
from?matplotlib?import?pyplot

#?transform?a?time?series?dataset?into?a?supervised?learning?dataset
def?series_to_supervised(data,?n_in=1,?n_out=1,?dropnan=True):
????n_vars?=?1?if?type(data)?is?list?else?data.shape[1]
????df?=?DataFrame(data)
????cols?=?list()
????#?input?sequence?(t-n,?...?t-1)
????for?i?in?range(n_in,?0,?-1):
????????cols.append(df.shift(i))
????#?forecast?sequence?(t,?t+1,?...?t+n)
????for?i?in?range(0,?n_out):
????????cols.append(df.shift(-i))
????#?put?it?all?together
????agg?=?concat(cols,?axis=1)
????#?drop?rows?with?NaN?values
????if?dropnan:
????????agg.dropna(inplace=True)
????return?agg.values

#?split?a?univariate?dataset?into?train/test?sets
def?train_test_split(data,?n_test):
????return?data[:-n_test,?:],?data[-n_test:,?:]

#?fit?an?xgboost?model?and?make?a?one?step?prediction
def?xgboost_forecast(train,?testX):
????#?transform?list?into?array
????train?=?asarray(train)
????#?split?into?input?and?output?columns
????trainX,?trainy?=?train[:,?:-1],?train[:,?-1]
????#?fit?model
????model?=?XGBRegressor(objective='reg:squarederror',?n_estimators=1000)
????model.fit(trainX,?trainy)
????#?make?a?one-step?prediction
????yhat?=?model.predict(asarray([testX]))
????return?yhat[0]

#?walk-forward?validation?for?univariate?data
def?walk_forward_validation(data,?n_test):
????predictions?=?list()
????#?split?dataset
????train,?test?=?train_test_split(data,?n_test)
????#?seed?history?with?training?dataset
????history?=?[x?for?x?in?train]
????#?step?over?each?time-step?in?the?test?set
????for?i?in?range(len(test)):
????????#?split?test?row?into?input?and?output?columns
????????testX,?testy?=?test[i,?:-1],?test[i,?-1]
????????#?fit?model?on?history?and?make?a?prediction
????????yhat?=?xgboost_forecast(history,?testX)
????????#?store?forecast?in?list?of?predictions
????????predictions.append(yhat)
????????#?add?actual?observation?to?history?for?the?next?loop
????????history.append(test[i])
????????#?summarize?progress
????????print('>expected=%.1f,?predicted=%.1f'?%?(testy,?yhat))
????#?estimate?prediction?error
????error?=?mean_absolute_error(test[:,?-1],?predictions)
????return?error,?test[:,?-1],?predictions

#?load?the?dataset
series?=?read_csv('daily-total-female-births.csv',?header=0,?index_col=0)
values?=?series.values
#?transform?the?time?series?data?into?supervised?learning
data?=?series_to_supervised(values,?n_in=6)
#?evaluate
mae,?y,?yhat?=?walk_forward_validation(data,?12)
print('MAE:?%.3f'?%?mae)
#?plot?expected?vs?preducted
pyplot.plot(y,?label='Expected')
pyplot.plot(yhat,?label='Predicted')
pyplot.legend()
pyplot.show()

運行這個示例將報告測試集中每個時間的預(yù)期值和預(yù)測值，然后報告所有預(yù)測值的MAE。

我們可以看到，該模型比6.7MAE的持久性模型表現(xiàn)得更好，實現(xiàn)了大約5.3個出生嬰兒的MAE。

你可以做的更好嗎？

可以嘗試不同的XGBoost超參數(shù)，以及不同的時間步長的輸入，看看是否能夠得到更好的模型，歡迎在評論區(qū)中分享結(jié)果。

>expected=42.0,?predicted=44.5
>expected=53.0,?predicted=42.5
>expected=39.0,?predicted=40.3
>expected=40.0,?predicted=32.5
>expected=38.0,?predicted=41.1
>expected=44.0,?predicted=45.3
>expected=34.0,?predicted=40.2
>expected=37.0,?predicted=35.0
>expected=52.0,?predicted=32.5
>expected=48.0,?predicted=41.4
>expected=55.0,?predicted=46.6
>expected=50.0,?predicted=47.2
MAE:?5.957

下圖繪制了用于比較最后12個月的預(yù)測值和實際值的折線圖，該圖提供了一個測試集上模型表現(xiàn)情況的可視化展示。

一旦選擇了最終的XGBoost模型參數(shù)，就可以確定一個模型并用于對新數(shù)據(jù)進行預(yù)測。

這稱為樣本外預(yù)測，例如訓練集之外的預(yù)測。這與在評估模型期間進行預(yù)測是相同的：因為在評估選擇哪個模型和用這個模型在新數(shù)據(jù)上做預(yù)測的流程是一樣的。

下面的示例演示如何在所有可用數(shù)據(jù)上擬合最終的XGBoost模型，并在數(shù)據(jù)集末尾之外進行一步預(yù)測。

#?finalize?model?and?make?a?prediction?for?monthly?births?with?xgboost
from?numpy?import?asarray
from?pandas?import?read_csv
from?pandas?import?DataFrame
from?pandas?import?concat
from?xgboost?import?XGBRegressor

#?transform?a?time?series?dataset?into?a?supervised?learning?dataset
def?series_to_supervised(data,?n_in=1,?n_out=1,?dropnan=True):
????n_vars?=?1?if?type(data)?is?list?else?data.shape[1]
????df?=?DataFrame(data)
????cols?=?list()
????#?input?sequence?(t-n,?...?t-1)
????for?i?in?range(n_in,?0,?-1):
????????cols.append(df.shift(i))
????#?forecast?sequence?(t,?t+1,?...?t+n)
????for?i?in?range(0,?n_out):
????????cols.append(df.shift(-i))
????#?put?it?all?together
????agg?=?concat(cols,?axis=1)
????#?drop?rows?with?NaN?values
????if?dropnan:
????????agg.dropna(inplace=True)
????return?agg.values

#?load?the?dataset
series?=?read_csv('daily-total-female-births.csv',?header=0,?index_col=0)
values?=?series.values
#?transform?the?time?series?data?into?supervised?learning
train?=?series_to_supervised(values,?n_in=6)
#?split?into?input?and?output?columns
trainX,?trainy?=?train[:,?:-1],?train[:,?-1]
#?fit?model
model?=?XGBRegressor(objective='reg:squarederror',?n_estimators=1000)
model.fit(trainX,?trainy)
#?construct?an?input?for?a?new?preduction
row?=?values[-6:].flatten()
#?make?a?one-step?prediction
yhat?=?model.predict(asarray([row]))
print('Input:?%s,?Predicted:?%.3f'?%?(row,?yhat[0]))

運行該代碼，基于所有可用數(shù)據(jù)構(gòu)建XGBoost模型。

使用最后三個月的已知數(shù)據(jù)作為新的輸入行，并預(yù)測數(shù)據(jù)集結(jié)束后的下一個月。

Input:?[34?37?52?48?55?50],?Predicted:?42.708

04 進一步閱讀

如果您想深入了解，本節(jié)將提供有關(guān)該主題的更多資源。

相關(guān)教程

機器學習中梯度提升算法的簡要介紹
https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

時間序列預(yù)測轉(zhuǎn)化為監(jiān)督學習問題
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

如何用Python 將時間序列問題轉(zhuǎn)化為有監(jiān)督學習問題
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

How To Backtest Machine Learning Models for Time Series ? ? Forecasting
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/

05 總結(jié)

在本教程中，您了解了如何為時間序列預(yù)測開發(fā)XGBoost模型。

具體來說，你學到了：?

XGBoost是用于分類和回歸的梯度boosting集成算法的實現(xiàn)
時間序列數(shù)據(jù)集可以通過滑動窗口表示轉(zhuǎn)化為有監(jiān)督學習。?????????????
如何使用XGBoost模型擬合、評估和預(yù)測時間序列預(yù)測。

原文標題：

How to Use XGBoost for Time Series Forecasting

原文鏈接：

https://machinelearningmastery.com/xgboost-for-time-series-forecasting/

關(guān)于譯者：王威力，養(yǎng)老醫(yī)療行業(yè)BI從業(yè)者。保持學習。

劃重點?

干貨直達?

吐血整理：24種可視化圖表優(yōu)缺點對比，一圖看懂！
硬核科普：什么是狹義相對論？它有哪些驚人結(jié)論？
為什么Spark能成為最火的大數(shù)據(jù)計算引擎？它是怎樣工作的？
吐血整理：盤點19種大數(shù)據(jù)處理的典型工具

更多精彩?

在公眾號對話框輸入以下關(guān)鍵詞

查看更多優(yōu)質(zhì)內(nèi)容！

PPT?|?讀書?|?書單?|?硬核?|?干貨?|?講明白?|?神操作

大數(shù)據(jù)?|?云計算?|?數(shù)據(jù)庫?|?Python?|?可視化

AI?|?人工智能?|?機器學習?|?深度學習?|?NLP

5G?|?中臺?|?用戶畫像?|?1024?|?數(shù)學?|?算法?|?數(shù)字孿生

據(jù)統(tǒng)計，99%的大咖都完成了這個神操作

如何用XGBoost做時間序列預(yù)測？終于有人講明白了

如何用XGBoost做時間序列預(yù)測？終于有人講明白了