另类日韩,成人射AV,全黄做爰100分钟视频,天堂在线中文,国产欧美日韩久久,久久久久久久香蕉视频,国产乱码精品一品二品,左手影音成人在线

XGBoost是梯度分類和回歸問題的有效實(shí)現(xiàn)。

它既快速又高效，即使在各種預(yù)測(cè)建模任務(wù)上也表現(xiàn)出色，即使不是最好的，也能在數(shù)據(jù)科學(xué)競(jìng)賽的獲勝者（例如Kaggle的獲獎(jiǎng)?wù)撸┲袕V受青睞。

XGBoost也可以用于時(shí)間序列預(yù)測(cè)，盡管它要求將時(shí)間序列數(shù)據(jù)集首先轉(zhuǎn)換為有監(jiān)督的學(xué)習(xí)問題。它還需要使用一種專門的技術(shù)來評(píng)估模型，稱為前向驗(yàn)證，因?yàn)槭褂胟倍交叉驗(yàn)證對(duì)模型進(jìn)行評(píng)估會(huì)導(dǎo)致樂觀的結(jié)果。

在本教程中，您將發(fā)現(xiàn)如何開發(fā)XGBoost模型進(jìn)行時(shí)間序列預(yù)測(cè)。完成本教程后，您將知道：

1、XGBoost是用于分類和回歸的梯度提升集成算法的實(shí)現(xiàn)。

2、可以使用滑動(dòng)窗口表示將時(shí)間序列數(shù)據(jù)集轉(zhuǎn)換為監(jiān)督學(xué)習(xí)。

3、如何使用XGBoost模型擬合，評(píng)估和進(jìn)行預(yù)測(cè)，以進(jìn)行時(shí)間序列預(yù)測(cè)。

教程概述

本教程分為三個(gè)部分：他們是：

1、XGBoost集成

2、時(shí)間序列數(shù)據(jù)準(zhǔn)備

3、XGBoost用于時(shí)間序列預(yù)測(cè)

XGBoost集成

XGBoost是Extreme Gradient Boosting的縮寫，是隨機(jī)梯度提升機(jī)器學(xué)習(xí)算法的有效實(shí)現(xiàn)。隨機(jī)梯度增強(qiáng)算法（也稱為梯度增強(qiáng)機(jī)或樹增強(qiáng)）是一種功能強(qiáng)大的機(jī)器學(xué)習(xí)技術(shù)，可在各種具有挑戰(zhàn)性的機(jī)器學(xué)習(xí)問題上表現(xiàn)出色，甚至表現(xiàn)最佳。

它是決策樹算法的集合，其中新樹修復(fù)了那些已經(jīng)屬于模型的樹的錯(cuò)誤。將添加樹，直到無法對(duì)模型進(jìn)行進(jìn)一步的改進(jìn)為止。XGBoost提供了隨機(jī)梯度提升算法的高效實(shí)現(xiàn)，并提供了一組模型超參數(shù)，這些參數(shù)旨在提供對(duì)模型訓(xùn)練過程的控制。

XGBoost設(shè)計(jì)用于表格數(shù)據(jù)集的分類和回歸，盡管它可以用于時(shí)間序列預(yù)測(cè)。

首先，必須安裝XGBoost庫(kù)。您可以使用pip進(jìn)行安裝，如下所示：

sudo pip install xgboost

一旦安裝，您可以通過運(yùn)行以下代碼來確認(rèn)它已成功安裝，并且您正在使用現(xiàn)代版本：

# xgboost
import xgboost
print("xgboost", xgboost.__version__)

運(yùn)行代碼，您應(yīng)該看到以下版本號(hào)或更高版本。

xgboost 1.0.1

盡管XGBoost庫(kù)具有自己的Python API，但我們可以通過XGBRegressor包裝器類將XGBoost模型與scikit-learn API結(jié)合使用。

可以實(shí)例化模型的實(shí)例，就像將其用于模型評(píng)估的任何其他scikit-learn類一樣使用。例如：

# define model
model = XGBRegressor()

現(xiàn)在我們已經(jīng)熟悉了XGBoost，下面讓我們看一下如何為監(jiān)督學(xué)習(xí)準(zhǔn)備時(shí)間序列數(shù)據(jù)集。

時(shí)間序列數(shù)據(jù)準(zhǔn)備

時(shí)間序列數(shù)據(jù)可以表述為監(jiān)督學(xué)習(xí)。給定時(shí)間序列數(shù)據(jù)集的數(shù)字序列，我們可以將數(shù)據(jù)重組為看起來像監(jiān)督學(xué)習(xí)的問題。我們可以通過使用以前的時(shí)間步長(zhǎng)作為輸入變量，并使用下一個(gè)時(shí)間步長(zhǎng)作為輸出變量來做到這一點(diǎn)。讓我們通過一個(gè)例子來具體說明。假設(shè)我們有一個(gè)時(shí)間序列，如下所示：

time, measure
1, 100
2, 110
3, 108
4, 115
5, 120

通過使用上一個(gè)時(shí)間步的值來預(yù)測(cè)下一個(gè)時(shí)間步的值，我們可以將此時(shí)間序列數(shù)據(jù)集重組為監(jiān)督學(xué)習(xí)問題。通過這種方式重組時(shí)間序列數(shù)據(jù)集，數(shù)據(jù)將如下所示：

X, y
?, 100
100, 110
110, 108
108, 115
115, 120
120, ?

請(qǐng)注意，時(shí)間列已刪除，某些數(shù)據(jù)行不可用于訓(xùn)練模型，例如第一和最后一個(gè)。

這種表示稱為滑動(dòng)窗口，因?yàn)檩斎牒皖A(yù)期輸出的窗口會(huì)隨著時(shí)間向前移動(dòng)，從而為監(jiān)督學(xué)習(xí)模型創(chuàng)建新的“樣本”。

有關(guān)準(zhǔn)備時(shí)間序列預(yù)測(cè)數(shù)據(jù)的滑動(dòng)窗口方法的更多信息。

在給定所需的輸入和輸出序列長(zhǎng)度的情況下，我們可以在Pandas中使用shift（）函數(shù)自動(dòng)創(chuàng)建時(shí)間序列問題的新框架。

這將是一個(gè)有用的工具，因?yàn)樗鼘⒃试S我們使用機(jī)器學(xué)習(xí)算法探索時(shí)間序列問題的不同框架，以查看可能導(dǎo)致性能更好的模型。

下面的函數(shù)將一個(gè)時(shí)間序列作為具有一個(gè)或多個(gè)列的NumPy數(shù)組時(shí)間序列，并將其轉(zhuǎn)換為具有指定數(shù)量的輸入和輸出的監(jiān)督學(xué)習(xí)問題。

# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
 n_vars = 1 if type(data) is list else data.shape[1]
 df = DataFrame(data)
 cols = list()
 # input sequence (t-n, ... t-1)
 for i in range(n_in, 0, -1):
  cols.append(df.shift(i))
 # forecast sequence (t, t+1, ... t+n)
 for i in range(0, n_out):
  cols.append(df.shift(-i))
 # put it all together
 agg = concat(cols, axis=1)
 # drop rows with NaN values
 if dropnan:
  agg.dropna(inplace=True)
 return agg.values

我們可以使用此函數(shù)為XGBoost準(zhǔn)備時(shí)間序列數(shù)據(jù)集。

準(zhǔn)備好數(shù)據(jù)集后，我們必須小心如何使用它來擬合和評(píng)估模型。

例如，將模型擬合未來的數(shù)據(jù)并預(yù)測(cè)過去是無效的。該模型必須在過去進(jìn)行訓(xùn)練并預(yù)測(cè)未來。這意味著不能使用在評(píng)估過程中將數(shù)據(jù)集隨機(jī)化的方法，例如k折交叉驗(yàn)證。相反，我們必須使用一種稱為前向驗(yàn)證的技術(shù)。在前向驗(yàn)證中，首先通過選擇一個(gè)切點(diǎn)（例如除過去12個(gè)月外，所有數(shù)據(jù)均用于培訓(xùn)，最近12個(gè)月用于測(cè)試。

如果我們有興趣進(jìn)行單步預(yù)測(cè)，例如一個(gè)月后，我們可以通過對(duì)訓(xùn)練數(shù)據(jù)集進(jìn)行訓(xùn)練并預(yù)測(cè)測(cè)試數(shù)據(jù)集的第一步來評(píng)估模型。然后，我們可以將來自測(cè)試集的真實(shí)觀測(cè)值添加到訓(xùn)練數(shù)據(jù)集中，重新擬合模型，然后讓模型預(yù)測(cè)測(cè)試數(shù)據(jù)集中的第二步。對(duì)整個(gè)測(cè)試數(shù)據(jù)集重復(fù)此過程將為整個(gè)測(cè)試數(shù)據(jù)集提供一步式預(yù)測(cè)，可以從中計(jì)算出誤差度量以評(píng)估模型的技能。

下面的函數(shù)執(zhí)行前向驗(yàn)證。它使用時(shí)間序列數(shù)據(jù)集的整個(gè)監(jiān)督學(xué)習(xí)版本以及用作測(cè)試集的行數(shù)作為參數(shù)。然后，它逐步通過測(cè)試集，調(diào)用xgboost_forecast（）函數(shù)進(jìn)行單步預(yù)測(cè)。計(jì)算錯(cuò)誤度量，并將詳細(xì)信息返回以進(jìn)行分析。

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
 predictions = list()
 # split dataset
 train, test = train_test_split(data, n_test)
 # seed history with training dataset
 history = [x for x in train]
 # step over each time-step in the test set
 for i in range(len(test)):
  # split test row into input and output columns
  testX, testy = test[i, :-1], test[i, -1]
  # fit model on history and make a prediction
  yhat = xgboost_forecast(history, testX)
  # store forecast in list of predictions
  predictions.append(yhat)
  # add actual observation to history for the next loop
  history.append(test[i])
  # summarize progress
  print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
 # estimate prediction error
 error = mean_absolute_error(test[:, -1], predictions)
 return error, test[:, 1], predictions

調(diào)用train_test_split（）函數(shù)可將數(shù)據(jù)集拆分為訓(xùn)練集和測(cè)試集。我們可以在下面定義此功能。

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
 return data[:-n_test, :], data[-n_test:, :]

我們可以使用XGBRegressor類進(jìn)行單步預(yù)測(cè)。下面的xgboost_forecast（）函數(shù)通過將訓(xùn)練數(shù)據(jù)集和測(cè)試輸入行作為輸入，擬合模型并進(jìn)行單步預(yù)測(cè)來實(shí)現(xiàn)此目的。

# fit an xgboost model and make a one step prediction
def xgboost_forecast(train, testX):
 # transform list into array
 train = asarray(train)
 # split into input and output columns
 trainX, trainy = train[:, :-1], train[:, -1]
 # fit model
 model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
 model.fit(trainX, trainy)
 # make a one-step prediction
 yhat = model.predict([testX])
 return yhat[0]

現(xiàn)在，我們知道了如何準(zhǔn)備時(shí)間序列數(shù)據(jù)以進(jìn)行預(yù)測(cè)和評(píng)估XGBoost模型，接下來我們可以看看在實(shí)際數(shù)據(jù)集上使用XGBoost的情況。

XGBoost用于時(shí)間序列預(yù)測(cè)

在本節(jié)中，我們將探索如何使用XGBoost進(jìn)行時(shí)間序列預(yù)測(cè)。我們將使用標(biāo)準(zhǔn)的單變量時(shí)間序列數(shù)據(jù)集，以使用該模型進(jìn)行單步預(yù)測(cè)。您可以將本節(jié)中的代碼用作您自己項(xiàng)目的起點(diǎn)，并輕松地對(duì)其進(jìn)行調(diào)整以適應(yīng)多變量輸入，多變量預(yù)測(cè)和多步預(yù)測(cè)。我們將使用每日女性出生數(shù)據(jù)集，即三年中的每月出生數(shù)。

您可以從此處下載數(shù)據(jù)集，并將其放在文件名“ daily-total-female-births.csv”的當(dāng)前工作目錄中。

數(shù)據(jù)集（每天女性出生總數(shù).csv）:

https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv

說明（每日女性出生總數(shù)）:

https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names

數(shù)據(jù)集的前幾行如下所示：

"Date","Births"
"1959-01-01",35
"1959-01-02",32
"1959-01-03",30
"1959-01-04",31
"1959-01-05",44
...

首先，讓我們加載并繪制數(shù)據(jù)集。下面列出了完整的示例。

# load and plot the time series dataset
from pandas import read_csv
from matplotlib import pyplot
# load dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# plot dataset
pyplot.plot(values)
pyplot.show()

運(yùn)行示例將創(chuàng)建數(shù)據(jù)集的折線圖。我們可以看到?jīng)]有明顯的趨勢(shì)或季節(jié)性。

當(dāng)預(yù)測(cè)最近的12個(gè)月時(shí)，持久性模型可以實(shí)現(xiàn)約6.7例出生的MAE。這提供了性能基準(zhǔn)，在該基準(zhǔn)之上可以認(rèn)為模型是熟練的。

接下來，當(dāng)對(duì)過去12個(gè)月的數(shù)據(jù)進(jìn)行單步預(yù)測(cè)時(shí)，我們可以評(píng)估數(shù)據(jù)集上的XGBoost模型。

我們將僅使用前6個(gè)時(shí)間步長(zhǎng)作為模型和默認(rèn)模型超參數(shù)的輸入，除了我們將損失更改為'reg：squarederror'（以避免警告消息），并在集合中使用1,000棵樹（以避免學(xué)習(xí)不足））。

下面列出了完整的示例。

# forecast monthly births with xgboost
from numpy import asarray
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor
from matplotlib import pyplot
 
# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
 n_vars = 1 if type(data) is list else data.shape[1]
 df = DataFrame(data)
 cols = list()
 # input sequence (t-n, ... t-1)
 for i in range(n_in, 0, -1):
  cols.append(df.shift(i))
 # forecast sequence (t, t+1, ... t+n)
 for i in range(0, n_out):
  cols.append(df.shift(-i))
 # put it all together
 agg = concat(cols, axis=1)
 # drop rows with NaN values
 if dropnan:
  agg.dropna(inplace=True)
 return agg.values
 
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
 return data[:-n_test, :], data[-n_test:, :]
 
# fit an xgboost model and make a one step prediction
def xgboost_forecast(train, testX):
 # transform list into array
 train = asarray(train)
 # split into input and output columns
 trainX, trainy = train[:, :-1], train[:, -1]
 # fit model
 model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
 model.fit(trainX, trainy)
 # make a one-step prediction
 yhat = model.predict(asarray([testX]))
 return yhat[0]
 
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
 predictions = list()
 # split dataset
 train, test = train_test_split(data, n_test)
 # seed history with training dataset
 history = [x for x in train]
 # step over each time-step in the test set
 for i in range(len(test)):
  # split test row into input and output columns
  testX, testy = test[i, :-1], test[i, -1]
  # fit model on history and make a prediction
  yhat = xgboost_forecast(history, testX)
  # store forecast in list of predictions
  predictions.append(yhat)
  # add actual observation to history for the next loop
  history.append(test[i])
  # summarize progress
  print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
 # estimate prediction error
 error = mean_absolute_error(test[:, -1], predictions)
 return error, test[:, -1], predictions
 
# load the dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# transform the time series data into supervised learning
data = series_to_supervised(values, n_in=6)
# evaluate
mae, y, yhat = walk_forward_validation(data, 12)
print('MAE: %.3f' % mae)
# plot expected vs preducted
pyplot.plot(y, label='Expected')
pyplot.plot(yhat, label='Predicted')
pyplot.legend()
pyplot.show()

運(yùn)行示例將報(bào)告測(cè)試集中每個(gè)步驟的期望值和預(yù)測(cè)值，然后報(bào)告所有預(yù)測(cè)值的MAE。

注意：由于算法或評(píng)估程序的隨機(jī)性，或者數(shù)值精度的差異，您的結(jié)果可能會(huì)有所不同。考慮運(yùn)行該示例幾次并比較平均結(jié)果。

我們可以看到，該模型的性能優(yōu)于持久性模型，MAE約為5.9，而MAE約為6.7

>expected=42.0, predicted=44.5
>expected=53.0, predicted=42.5
>expected=39.0, predicted=40.3
>expected=40.0, predicted=32.5
>expected=38.0, predicted=41.1
>expected=44.0, predicted=45.3
>expected=34.0, predicted=40.2
>expected=37.0, predicted=35.0
>expected=52.0, predicted=32.5
>expected=48.0, predicted=41.4
>expected=55.0, predicted=46.6
>expected=50.0, predicted=47.2
MAE: 5.957

創(chuàng)建線圖，比較數(shù)據(jù)集最后12個(gè)月的一系列期望值和預(yù)測(cè)值。這給出了模型在測(cè)試集上執(zhí)行得如何的幾何解釋。

圖2

一旦選擇了最終的XGBoost模型配置，就可以最終確定模型并用于對(duì)新數(shù)據(jù)進(jìn)行預(yù)測(cè)。這稱為樣本外預(yù)測(cè)，例如超出訓(xùn)練數(shù)據(jù)集進(jìn)行預(yù)測(cè)。這與在模型評(píng)估期間進(jìn)行預(yù)測(cè)是相同的：因?yàn)槲覀兪冀K希望使用模型用于對(duì)新數(shù)據(jù)進(jìn)行預(yù)測(cè)時(shí)所期望使用的相同過程來評(píng)估模型。下面的示例演示了在所有可用數(shù)據(jù)上擬合最終XGBoost模型并在數(shù)據(jù)集末尾進(jìn)行單步預(yù)測(cè)的過程。

# finalize model and make a prediction for monthly births with xgboost
from numpy import asarray
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from xgboost import XGBRegressor
 
# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
 n_vars = 1 if type(data) is list else data.shape[1]
 df = DataFrame(data)
 cols = list()
 # input sequence (t-n, ... t-1)
 for i in range(n_in, 0, -1):
  cols.append(df.shift(i))
 # forecast sequence (t, t+1, ... t+n)
 for i in range(0, n_out):
  cols.append(df.shift(-i))
 # put it all together
 agg = concat(cols, axis=1)
 # drop rows with NaN values
 if dropnan:
  agg.dropna(inplace=True)
 return agg.values
 
# load the dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# transform the time series data into supervised learning
train = series_to_supervised(values, n_in=6)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
model.fit(trainX, trainy)
# construct an input for a new preduction
row = values[-6:].flatten()
# make a one-step prediction
yhat = model.predict(asarray([row]))
print('Input: %s, Predicted: %.3f' % (row, yhat[0]))

運(yùn)行示例將XGBoost模型適合所有可用數(shù)據(jù)。使用最近6個(gè)月的已知數(shù)據(jù)準(zhǔn)備新的輸入行，并預(yù)測(cè)數(shù)據(jù)集結(jié)束后的下個(gè)月。

Input: [34 37 52 48 55 50], Predicted: 42.708

作者：沂水寒城，CSDN博客專家，個(gè)人研究方向：機(jī)器學(xué)習(xí)、深度學(xué)習(xí)、NLP、CV

Blog: http://yishuihancheng.blog.csdn.net

贊賞作者

更多閱讀

用 Python 從零開始實(shí)現(xiàn)簡(jiǎn)單遺傳算法

5分鐘掌握 Python 隨機(jī)爬山算法

5分鐘完全讀懂關(guān)聯(lián)規(guī)則挖掘算法

特別推薦

點(diǎn)擊下方閱讀原文加入社區(qū)會(huì)員

用 XGBoost 進(jìn)行時(shí)間序列預(yù)測(cè)