如何用XGBoost做時間序列預(yù)測?
本文約3300字,建議閱讀10分鐘
XGBoost是用于分類和回歸問題的梯度提升集成方法的一個實現(xiàn)。 通過使用滑動時間窗口表示,時間序列數(shù)據(jù)集可以適用于有監(jiān)督學(xué)習(xí)。 在時間序列預(yù)測問題上,如何使用XGBoost模型進行擬合、評估、預(yù)測。
《機器學(xué)習(xí)中梯度提升算法的簡要概括》
鏈接:https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/






《Time Series Forecasting as Supervised Learning》
鏈接:https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

《如何在Python中將時間序列轉(zhuǎn)化為監(jiān)督學(xué)習(xí)問題》
鏈接:https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
《How To Backtest Machine Learning Models for Time Series Forecasting》
鏈接:https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/)



Dataset (daily-total-female-births.csv)
鏈接:https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv
Description (daily-total-female-births.names)
鏈接:https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names



# forecast monthlybirths with xgboostfrom numpy importasarrayfrom pandas importread_csvfrom pandas importDataFramefrom pandas importconcatfrom sklearn.metricsimport mean_absolute_errorfrom xgboost importXGBRegressorfrom matplotlib importpyplot# transform a timeseries dataset into a supervised learning datasetdefseries_to_supervised(data, n_in=1, n_out=1, dropnan=True):n_vars = 1 if type(data) is list elsedata.shape[1]df = DataFrame(data)cols = list()# input sequence (t-n, ... t-1)for i in range(n_in, 0, -1):cols.append(df.shift(i))# forecast sequence (t, t+1, ... t+n)for i in range(0, n_out):cols.append(df.shift(-i))# put it all togetheragg = concat(cols, axis=1)# drop rows with NaN valuesif dropnan:agg.dropna(inplace=True)return agg.values# split a univariatedataset into train/test setsdeftrain_test_split(data, n_test):return data[:-n_test, :], data[-n_test:,:]# fit an xgboost modeland make a one step predictiondef xgboost_forecast(train,testX):# transform list into arraytrain = asarray(train)# split into input and output columnstrainX, trainy = train[:, :-1], train[:,-1]# fit modelmodel =XGBRegressor(objective='reg:squarederror', n_estimators=1000)model.fit(trainX, trainy)# make a one-step predictionyhat = model.predict(asarray([testX]))return yhat[0]# walk-forwardvalidation for univariate datadefwalk_forward_validation(data, n_test):predictions = list()# split datasettrain, test = train_test_split(data,n_test)# seed history with training datasethistory = [x for x in train]# step over each time-step in the testsetfor i in range(len(test)):# split test row into input andoutput columnstestX, testy = test[i, :-1],test[i, -1]# fit model on history and make apredictionyhat = xgboost_forecast(history,testX)# store forecast in list ofpredictionspredictions.append(yhat)# add actual observation tohistory for the next loophistory.append(test[i])# summarize progressprint('>expected=%.1f,predicted=%.1f' % (testy, yhat))# estimate prediction errorerror = mean_absolute_error(test[:, 1],predictions)return error, test[:, 1], predictions# load the datasetseries =read_csv('daily-total-female-births.csv', header=0, index_col=0)values = series.values# transform the timeseries data into supervised learningdata =series_to_supervised(values, n_in=3)# evaluatemae, y, yhat =walk_forward_validation(data, 12)print('MAE: %.3f' %mae)# plot expected vspreductedpyplot.plot(y,label='Expected')pyplot.plot(yhat,label='Predicted')pyplot.legend()pyplot.show()


# finalize model andmake a prediction for monthly births with xgboostfrom numpy importasarrayfrom pandas importread_csvfrom pandas importDataFramefrom pandas importconcatfrom xgboost importXGBRegressor# transform a timeseries dataset into a supervised learning datasetn_in=1, n_out=1, dropnan=True):n_vars = 1 if type(data) is list elsedata.shape[1]df = DataFrame(data)cols = list()# input sequence (t-n, ... t-1)for i in range(n_in, 0, -1):cols.append(df.shift(i))# forecast sequence (t, t+1, ... t+n)for i in range(0, n_out):cols.append(df.shift(-i))# put it all togetheragg = concat(cols, axis=1)# drop rows with NaN valuesif dropnan:=True)return agg.values# load the datasetseries =read_csv('daily-total-female-births.csv', header=0, index_col=0)values = series.values# transform the timeseries data into supervised learningtrain =series_to_supervised(values, n_in=3)# split into input andoutput columnstrainy =train[:, :-1], train[:, -1]# fit modelmodel =XGBRegressor(objective='reg:squarederror', n_estimators=1000)model.fit(trainX,trainy)# construct an inputfor a new preductionrow = values[-3:].flatten()# make a one-steppredictionyhat =model.predict(asarray([row])): %s,Predicted: %.3f' % (row, yhat[0]))

機器學(xué)習(xí)中梯度提升算法的簡要介紹
https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/
時間序列預(yù)測轉(zhuǎn)化為監(jiān)督學(xué)習(xí)問題
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
如何用Python 將時間序列問題轉(zhuǎn)化為有監(jiān)督學(xué)習(xí)問題
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
How To Backtest Machine Learning Models for Time Series ? ? Forecasting
https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
XGBoost是用于分類和回歸的梯度boosting集成算法的實現(xiàn) 時間序列數(shù)據(jù)集可以通過滑動窗口表示轉(zhuǎn)化為有監(jiān)督學(xué)習(xí)。????????????? 如何使用XGBoost模型擬合、評估和預(yù)測時間序列預(yù)測。
原文標題:
How to Use XGBoost for Time Series Forecasting
原文鏈接:
https://machinelearningmastery.com/xgboost-for-time-series-forecasting/
