<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          【時(shí)間序列】使用 Auto-TS 自動(dòng)化時(shí)間序列預(yù)測(cè)

          共 13327字,需瀏覽 27分鐘

           ·

          2022-03-10 22:28

          Auto-TS 是 AutoML 的一部分,它將自動(dòng)化機(jī)器學(xué)習(xí)管道的一些組件。這自動(dòng)化庫(kù)有助于非專家訓(xùn)練基本的機(jī)器學(xué)習(xí)模型,而無需在該領(lǐng)域有太多知識(shí)。在本文中,小編和你一起學(xué)習(xí)如何使用 Auto-TS 庫(kù)自動(dòng)執(zhí)行時(shí)間序列預(yù)測(cè)模型。

          什么是自動(dòng) TS?

          它是一個(gè)開源 Python 庫(kù),主要用于自動(dòng)化時(shí)間序列預(yù)測(cè)。它將使用一行代碼自動(dòng)訓(xùn)練多個(gè)時(shí)間序列模型,這將幫助我們?yōu)槲覀兊膯栴}陳述選擇最好的模型。

          在 python 開源庫(kù) Auto-TS 中,auto-ts.Auto_TimeSeries() 使用訓(xùn)練數(shù)據(jù)調(diào)用的主要函數(shù)。然后我們可以選擇想要的模型類型,例如 stats、ml 或FB prophet-based models (基于 FB 先知的模型)。我們還可以調(diào)整參數(shù),這些參數(shù)將根據(jù)我們希望它基于的評(píng)分參數(shù)自動(dòng)選擇最佳模型。它將返回最佳模型和一個(gè)字典,其中包含提到的預(yù)測(cè)周期數(shù)的預(yù)測(cè)(默認(rèn)值 = 2)。

          Auto_timeseries 是用于時(shí)間序列數(shù)據(jù)的復(fù)雜模型構(gòu)建實(shí)用程序。由于它自動(dòng)化了復(fù)雜工作中涉及的許多任務(wù),因此它假定了許多智能默認(rèn)值。5.但是我們可以改變它們。Auto_Timeseries 將基于 Statsmodels ARIMA、Seasonal ARIMA 和 Scikit-Learn ML 快速構(gòu)建預(yù)測(cè)模型。它將自動(dòng)選擇給出指定最佳分?jǐn)?shù)的最佳模型。

          Auto_TimeSeries 能夠幫助我們使用 ARIMA、SARIMAX、VAR、可分解(趨勢(shì)+季節(jié)性+殘差)模型和集成機(jī)器學(xué)習(xí)模型等技術(shù)構(gòu)建和選擇多個(gè)時(shí)間序列模型。

          Auto-TS 庫(kù)的特點(diǎn)

          • 它使用遺傳規(guī)劃優(yōu)化找到最佳時(shí)間序列預(yù)測(cè)模型。
          • 它訓(xùn)練普通模型、統(tǒng)計(jì)模型、機(jī)器學(xué)習(xí)模型和深度學(xué)習(xí)模型,具有所有可能的超參數(shù)配置和交叉驗(yàn)證。
          • 它通過學(xué)習(xí)最佳 NaN 插補(bǔ)和異常值去除來執(zhí)行數(shù)據(jù)轉(zhuǎn)換以處理雜亂的數(shù)據(jù)。
          • 選擇用于模型選擇的指標(biāo)組合。

          安裝

          pip?install?auto-ts??#?或
          pip?install?git+git://github.com/AutoViML/Auto_TS

          依賴包,如下依賴包需要提前安裝

          dask
          scikit-learn
          FB?Prophet
          statsmodels
          pmdarima
          XGBoost

          導(dǎo)入庫(kù)

          from?auto_ts?import?auto_timeseries

          巨坑警告

          根據(jù)上述安裝步驟安裝成功后,很大概率會(huì)出現(xiàn)這樣的錯(cuò)誤:

          Running?setup.py?clean?for?fbprophet
          Failed?to?build?fbprophet
          Installing?collected?packages:?fbprophet
          ??Running?setup.py?install?for?fbprophet?...?error
          ?......
          ??from?pystan?import?StanModel
          ModuleNotFoundError:?No?module?named?'pystan'

          這個(gè)時(shí)候你會(huì)裝pystan:pip install pystan 。安裝完成后,還是會(huì)出現(xiàn)上述報(bào)錯(cuò)。如果你也出現(xiàn)了如上情況,不要慌,云朵君已經(jīng)幫你踩過坑了。

          參考解決方案:(Mac/anaconda)

          1. 安裝 Ephem:

          conda?install?-c?anaconda?ephem

          2. 安裝 Pystan:

          conda?install?-c?conda-forge?pystan

          3. 安裝 Fbprophet:

          (這個(gè)會(huì)花費(fèi)4小時(shí)+)

          conda?install?-c?conda-forge?fbprophet

          4. 最后安裝:

          pip?install?prophet
          pip?install?fbprophet

          5. 最后直到出現(xiàn):

          Successfully?installed?cmdstanpy-0.9.5?fbprophet-0.7.1?holidays-0.13

          如果上述還不行,你先嘗試重啟anaconda,如果還不行,則需要先安裝:

          conda?install?gcc

          再上述步驟走一遍。

          上述過程可能要花費(fèi)1天時(shí)間?。?/span>

          最后嘗試導(dǎo)入,成功!

          from?auto_ts?import?auto_timeseries
          Imported?auto_timeseries?version:0.0.65.?Call?by?using:
          model?=?auto_timeseries(score_type='rmse',?
          time_interval='M',?
          non_seasonal_pdq=None,?
          seasonality=False,????????
          seasonal_period=12,?
          model_type=['best'],?
          verbose=2,?
          dask_xgboost_flag=0)
          model.fit(traindata,?
          ts_column,target)
          model.predict(testdata,?model='best')

          auto_timeseries 中可用的參數(shù)

          model?=?auto_timeseries(
          score_type='rmse',?
          time_interval='Month',
          non_seasonal_pdq=None,
          seasonity=False,
          season_period=12,??
          model_type=['Prophet'],verbose=2)

          可以調(diào)整參數(shù)并分析模型性能的變化。有關(guān)參數(shù)的更多詳細(xì)信息參考auto-ts文檔[1]。

          使用的數(shù)據(jù)集

          本文使用了從 Kaggle 下載的 2006 年 1 月至 2018 年 1 月的亞馬遜股票價(jià)格[2]數(shù)據(jù)集。該庫(kù)僅提供訓(xùn)練時(shí)間序列預(yù)測(cè)模型。數(shù)據(jù)集應(yīng)該有一個(gè)時(shí)間或日期格式列。

          最初,使用時(shí)間/日期列加載時(shí)間序列數(shù)據(jù)集:

          df?=?pd.read_csv(
          ????"Amazon_Stock_Price.csv",?
          ????usecols=['Date',?'Close'])
          df['Date']?=?pd.to_datetime(df['Date'])
          df?=?df.sort_values('Date')

          現(xiàn)在,將整個(gè)數(shù)據(jù)拆分為訓(xùn)練數(shù)據(jù)和測(cè)試數(shù)據(jù):

          train_df?=?df.iloc[:2800]
          test_df?=?df.iloc[2800:]

          現(xiàn)在,我們將可視化拆分訓(xùn)練測(cè)試:

          train_df.Close.plot(
          ??????figsize=(15,8),?
          ??????title=?'AMZN?Stock?Price',?fontsize=14,?
          ??????label='Train')
          test_df.Close.plot(
          ??????figsize=(15,8),?
          ??????title=?'AMZN?Stock?Price', fontsize=14,?
          ??????label='Test')

          現(xiàn)在,讓我們初始化 Auto-TS 模型對(duì)象,并擬合訓(xùn)練數(shù)據(jù):

          model?=?auto_timeseries(
          ??forecast_period=219,?
          ??score_type='rmse',?
          ??time_interval='D',?
          ??model_type='best')
          model.fit(traindata=?train_df,
          ????ts_column="Date",
          ????target="Close")

          現(xiàn)在讓我們比較不同模型的準(zhǔn)確率:

          model.get_leaderboard()
          model.plot_cv_scores()

          得到如下結(jié)果:

          Start of Fit.....

          ? ? Target variable given as = Close

          Start of loading of data.....

          ? ? Inputs: ts_column = Date, sep = ,, target = ['Close']

          ? ? Using given input: pandas dataframe...

          ? ? Date column exists in given train data...

          ? ? train data shape = (2800, 1)

          Alert: Could not detect strf_time_format of Date. Provide strf_time format during "setup" for better results.


          Running Augmented Dickey-Fuller test with paramters:

          ? ? maxlag: 31 regression: c autolag: BIC

          Data is stationary after one differencing

          There is 1 differencing needed in this datasets for VAR model

          No time series plot since verbose = 0. Continuing

          Time Interval is given as D

          ? ? Correct Time interval given as a valid Pandas date-range frequency...

          WARNING: Running best models will take time... Be Patient...


          ==================================================

          Building Prophet Model

          ==================================================



          Running Facebook Prophet Model...

          ? Starting Prophet Fit

          ? ? ? No seasonality assumed since seasonality flag is set to False

          ? Starting Prophet Cross Validation

          Max. iterations using expanding window cross validation = 5


          Fold Number: 1 --> Train Shape: 1705 Test Shape: 219

          ? ? RMSE = 30.01

          ? ? Std Deviation of actuals = 19.52

          ? ? Normalized RMSE (as pct of std dev) = 154%

          Cross Validation window: 1 completed


          Fold Number: 2 --> Train Shape: 1924 Test Shape: 219

          ? ? RMSE = 45.33

          ? ? Std Deviation of actuals = 34.21

          ? ? Normalized RMSE (as pct of std dev) = 132%

          Cross Validation window: 2 completed


          Fold Number: 3 --> Train Shape: 2143 Test Shape: 219

          ? ? RMSE = 65.61

          ? ? Std Deviation of actuals = 39.85

          ? ? Normalized RMSE (as pct of std dev) = 165%

          Cross Validation window: 3 completed


          Fold Number: 4 --> Train Shape: 2362 Test Shape: 219

          ? ? RMSE = 178.53

          ? ? Std Deviation of actuals = 75.28

          ? ? Normalized RMSE (as pct of std dev) = 237%

          Cross Validation window: 4 completed


          Fold Number: 5 --> Train Shape: 2581 Test Shape: 219

          ? ? RMSE = 148.18

          ? ? Std Deviation of actuals = 57.62

          ? ? Normalized RMSE (as pct of std dev) = 257%

          Cross Validation window: 5 completed


          -------------------------------------------

          Model Cross Validation Results:

          -------------------------------------------

          ? ? MAE (Mean Absolute Error = 85.20

          ? ? MSE (Mean Squared Error = 12218.34

          ? ? MAPE (Mean Absolute Percent Error) = 17%

          ? ? RMSE (Root Mean Squared Error) = 110.5366

          ? ? Normalized RMSE (MinMax) = 18%

          ? ? Normalized RMSE (as Std Dev of Actuals)= 60%

          Time Taken = 13 seconds

          ? End of Prophet Fit



          ==================================================

          Building Auto SARIMAX Model

          ==================================================


          Running Auto SARIMAX Model...

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples


          SARIMAX RMSE (all folds): 73.9230

          SARIMAX Norm RMSE (all folds): 35%



          -------------------------------------------

          Model Cross Validation Results:

          -------------------------------------------

          ? ? MAE (Mean Absolute Error = 64.24

          ? ? MSE (Mean Squared Error = 7962.95

          ? ? MAPE (Mean Absolute Percent Error) = 12%

          ? ? RMSE (Root Mean Squared Error) = 89.2354

          ? ? Normalized RMSE (MinMax) = 14%

          ? ? Normalized RMSE (as Std Dev of Actuals)= 48%

          ? ? Using smaller parameters for larger dataset with greater than 1000 samples

          Refitting data with previously found best parameters

          ? ? Best aic metric = 18805.2

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?SARIMAX Results? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

          ==============================================================================

          Dep. Variable:? ? ? ? ? ? ? ? ? Close? ?No. Observations:? ? ? ? ? ? ? ? ?2800

          Model:? ? ? ? ? ? ? ?SARIMAX(2, 2, 0)? ?Log Likelihood? ? ? ? ? ? ? ?-9397.587

          Date:? ? ? ? ? ? ? ? Mon, 28 Feb 2022? ?AIC? ? ? ? ? ? ? ? ? ? ? ? ? 18805.174

          Time:? ? ? ? ? ? ? ? ? ? ? ? 19:45:31? ?BIC? ? ? ? ? ? ? ? ? ? ? ? ? 18834.854

          Sample:? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0? ?HQIC? ? ? ? ? ? ? ? ? ? ? ? ?18815.888

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- 2800? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

          Covariance Type:? ? ? ? ? ? ? ? ? opg? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

          ==============================================================================

          ? ? ? ? ? ? ? ? ?coef? ? std err? ? ? ? ? z? ? ? P>|z|? ? ? [0.025? ? ? 0.975]

          ------------------------------------------------------------------------------

          intercept? ? ?-0.0033? ? ? 0.557? ? ?-0.006? ? ? 0.995? ? ? -1.094? ? ? ?1.088

          drift? ? ? ?3.618e-06? ? ? 0.000? ? ? 0.015? ? ? 0.988? ? ? -0.000? ? ? ?0.000

          ar.L1? ? ? ? ?-0.6405? ? ? 0.008? ? -79.601? ? ? 0.000? ? ? -0.656? ? ? -0.625

          ar.L2? ? ? ? ?-0.2996? ? ? 0.009? ? -32.618? ? ? 0.000? ? ? -0.318? ? ? -0.282

          sigma2? ? ? ? 48.6323? ? ? 0.456? ? 106.589? ? ? 0.000? ? ? 47.738? ? ? 49.527

          ===================================================================================

          Ljung-Box (L1) (Q):? ? ? ? ? ? ? ? ? 14.84? ?Jarque-Bera (JB):? ? ? ? ? ? ?28231.48

          Prob(Q):? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.00? ?Prob(JB):? ? ? ? ? ? ? ? ? ? ? ? ?0.00

          Heteroskedasticity (H):? ? ? ? ? ? ? 19.43? ?Skew:? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0.56

          Prob(H) (two-sided):? ? ? ? ? ? ? ? ? 0.00? ?Kurtosis:? ? ? ? ? ? ? ? ? ? ? ? 18.53

          ===================================================================================


          Warnings:

          [1] Covariance matrix calculated using the outer product of gradients (complex-step).


          ===============================================

          Skipping VAR Model since dataset is > 1000 rows and it will take too long

          ===============================================



          ==================================================

          Building ML Model

          ==================================================




          Creating 2 lagged variables for Machine Learning model...

          ? ? You have set lag = 3 in auto_timeseries setup to feed prior targets. You cannot set lags > 10 ...

          ### Be careful setting dask_xgboost_flag to True since dask is unstable and doesn't work sometime's ###


          ########### Single-Label Regression Model Tuning and Training Started ####


          Fitting ML model

          ? ? 11 variables used in training ML model = ['Close(t-1)', 'Date_hour', 'Date_minute', 'Date_dayofweek', 'Date_quarter', 'Date_month', 'Date_year', 'Date_dayofyear', 'Date_dayofmonth', 'Date_weekofyear', 'Date_weekend']


          Running Cross Validation using XGBoost model..

          ? ? Max. iterations using expanding window cross validation = 2

          train fold shape (2519, 11), test fold shape = (280, 11)

          ### Number of booster rounds = 250 for XGBoost which can be set during setup ####

          ? ? Hyper Param Tuning XGBoost with CPU parameters. This will take time. Please be patient...

          Cross-validated Score = 31.896 in num rounds = 249

          Time taken for Hyper Param tuning of XGBoost (in minutes) = 0.0

          Top 10 features:

          ['Date_year', 'Close(t-1)', 'Date_quarter', 'Date_month', 'Date_weekofyear', 'Date_dayofyear', 'Date_dayofmonth', 'Date_dayofweek']

          ? ? Time taken for training XGBoost on entire train data (in minutes) = 0.0

          Returning the following:

          ? ? Model =

          ? ? Scaler = Pipeline(steps=[('columntransformer',

          ? ? ? ? ? ? ? ? ?ColumnTransformer(transformers=[('simpleimputer',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SimpleImputer(),

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ['Close(t-1)', 'Date_hour',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_minute',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofweek',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_quarter', 'Date_month',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_year',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofyear',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_dayofmonth',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_weekofyear',

          ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'Date_weekend'])])),

          ? ? ? ? ? ? ? ? ('maxabsscaler', MaxAbsScaler())])

          ? ? (3) sample predictions:[359.8374? 356.59747 355.447? ]

          XGBoost model tuning completed

          Target = Close...CV results:

          ? ? RMSE = 246.63

          ? ? Std Deviation of actuals = 94.60

          ? ? Normalized RMSE (as pct of std dev) = 261%


          Fitting model on entire train set. Please be patient...

          ? ? Time taken to train model (in seconds) = 0


          Best Model is: auto_SARIMAX

          ? ? Best Model (Mean CV) Score: 73.92



          --------------------------------------------------

          Total time taken: 52 seconds.

          --------------------------------------------------



          Leaderboard with best model on top of list:

          ? ? ? ? ? ? name? ? ? ? rmse

          1? auto_SARIMAX? ?73.922971

          0? ? ? ?Prophet? ?93.532440

          2? ? ? ? ? ? ML? 246.630613

          現(xiàn)在我們?cè)跍y(cè)試數(shù)據(jù)上測(cè)試我們的模型:

          future_predictions?=?model.predict(testdata=219)
          #?或?
          model.predict(
          ????testdata=test_df.Close)

          使用預(yù)測(cè)周期=219作為auto_SARIMAX模型的輸入進(jìn)行預(yù)測(cè):

          future_predictions

          可視化看下future_predictions是什么樣子:

          最后,可視化測(cè)試數(shù)據(jù)值和預(yù)測(cè):

          pred_df?=?pd.concat(
          ????[test_df,future_predictions],
          ????axis=1)
          ax.plot('Date','Close','b',
          ?????????data=pred_df,
          ?????????label='Test')
          ax.plot('Date','yhat','r',
          ?????????data=pred_df,
          ?????????label='Predicitions')

          auto_timeseries 中可用的參數(shù):

          model?=?auto_timeseries(?
          ????score_type='rmse',
          ????time_interval='Month',
          ????non_seasonal_pdq=None,?
          ????seasonity=False,??
          ????season_period=12,
          ????model_type=['Prophet'],
          ????verbose=2)

          model.fit() 中可用的參數(shù):

          model.fit(traindata=train_data,
          ????ts_column=ts_column,
          ????target=target,
          ????cv=5,?sep=","?)

          model.predict() 中可用的參數(shù):

          model?=?model.predict(testdata?=?'可以是數(shù)據(jù)框或代表預(yù)測(cè)周期的整數(shù)';??
          ??????????????????????model?=?'best',?'或代表訓(xùn)練模型的任何其他字符串')

          可以使用所有這些參數(shù)并分析我們模型的性能,然后可以為我們的問題陳述選擇最合適的模型??梢圆榭?span style="color: #1e6bb8;font-weight: bold;">auto-ts文檔[3]詳細(xì)檢查所有這些參數(shù)。

          寫在最后

          在本文中,討論了如何在一行 Python 代碼中自動(dòng)化時(shí)間序列模型。Auto-TS 對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,因?yàn)樗鼜臄?shù)據(jù)中刪除異常值并通過學(xué)習(xí)最佳 NaN 插補(bǔ)來處理混亂的數(shù)據(jù)。

          通過初始化 Auto-TS 對(duì)象并擬合訓(xùn)練數(shù)據(jù),它將自動(dòng)訓(xùn)練多個(gè)時(shí)間序列模型,例如 ARIMA、SARIMAX、FB Prophet、VAR,并得出性能最佳的模型。模型的結(jié)果跟數(shù)據(jù)集的大小有一定的關(guān)系。如果我們嘗試增加數(shù)據(jù)集的大小,結(jié)果應(yīng)該會(huì)有所改善。

          參考資料

          [1]

          auto-ts文檔: https://pypi.org/project/auto-ts/

          [2]

          亞馬遜股票價(jià)格: https://www.kaggle.com/szrlee/stock-time-series-20050101-to-20171231?select=AMZN_2006-01-01_to_2018-01-01.csv

          [3]

          auto-ts文檔: https://pypi.org/project/auto-ts/

          往期精彩回顧




          瀏覽 87
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  伊人中文无码 | 欧美日韩一区二区三区电影 | 五月婷婷六月激情 | 成人毛片在线观看 | 亚洲7777|