時(shí)間序列常用方法總結(jié)!(附代碼)
算法干貨
作者:戳戳龍 轉(zhuǎn)自:Datawhale
前
?? 平時(shí)工作中每天都在和時(shí)間序列打交道,對時(shí)間序列分析進(jìn)行研究是有必要的
?? 分享和交流一些自己的在時(shí)序處理方面的心得,提供一些思路
?? 介紹時(shí)序的發(fā)展情況,以及目前業(yè)界常用的方法
?? 代碼希望能模板化,能直接復(fù)制過去使用
時(shí)序方法發(fā)展


時(shí)間序列特征
??series = trend + seasons + dependence+ error
趨勢
?? 時(shí)間序列的趨勢分量表示該序列均值的持續(xù)的、長期的變化


Df['ma20'] = Df['amt'].rolling(20).mean()
周期性(季節(jié)性)

季節(jié)時(shí)序圖
def plot_season(Df):
df = Df.copy()
# 計(jì)算每周屬于哪一年
df['year'] = df['date'].dt.year
# 計(jì)算每周為一年當(dāng)中的第幾周
df['week_of_year'] = df['date'].dt.weekofyear
for year in df['year'].unique():
tmp_df = df[df['year'] == year]
plt.plot(tmp_df['week_of_year'], tmp_df['amt'], '.-', label=str(year))
plt.legend()
plt.show()

周期判斷
??如果每隔h個(gè)單位,ACF值有一個(gè)局部高峰,則數(shù)據(jù)存在以h為單位的周期性
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(Df['amt'], lags=500).show()

自相關(guān)性
自相關(guān)
??自相關(guān)函數(shù) autocorrelation function 有序的隨機(jī)變量序列與其自身相比較 自相關(guān)函數(shù)反映了同一序列在不同時(shí)序的取值之間的相關(guān)性
from statsmodels.graphics.tsaplots import plot_acf
_ = plot_acf(Df['amt'], lags=50)


偏自相關(guān)
from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(Df['amt'], lags=5)

殘差
外部變量 殘差
Prophet
??官方文檔:https://facebook.github.io/prophet/docs/quick_start.html#python-api
原理
模型結(jié)構(gòu)
??模型結(jié)構(gòu)——關(guān)于時(shí)間的廣義線性模型
g(t):trend,用分段線性函數(shù)或邏輯增長曲線(logistic)擬合s(t):seasonality,用傅里葉級數(shù)擬合。可以疊加多個(gè)季節(jié)性,如weekly,yearly (s = s1+s2……)h(t):regressor,用線性函數(shù)擬合。可以疊加多個(gè)外部變量,如節(jié)假日、溫度、活動(dòng)(h = h1+h2+……):模型殘差 不用擬合 以上方程也可以寫成乘法形式:
乘法形式和加法形式可以相互轉(zhuǎn)換,乘法形式兩邊取對數(shù)就是加法形式

趨勢
分段線性函數(shù)
??線性趨勢函數(shù)
分段線性趨勢函數(shù)
超參數(shù),由用戶給出
分幾段 參數(shù),根據(jù)歷史數(shù)據(jù)擬合
k:曲線增長速率m:曲線的截距
邏輯增長曲線

??函數(shù)展示:https://www.desmos.com/calculator/8pnqou9ojy?lang=zh-CN
超參數(shù) C:漸近線 一共分幾段 參數(shù) k:曲線增長速率 m:拐點(diǎn)對應(yīng)時(shí)間
周期性
??任何周期性函數(shù)都可以表示成傅里葉級數(shù)
超參數(shù):由用戶給定 參數(shù):由歷史數(shù)據(jù)擬合
?? 函數(shù)展示:(https://www.desmos.com/calculator/5prck2beq1?lang=zh-CN

外部因素
: 模型輸入, 外部因素在時(shí)刻的取值
Z可以是0-1變量 (e.g.是否是法定假日,是否是春節(jié),是否有促銷)
也可以是連續(xù)變量 (e.g.產(chǎn)品價(jià)格, 溫度,降雨量)
:線性回歸系數(shù)
算法流程


1?? 先設(shè)定表達(dá)式(超參數(shù))
2?? 根據(jù)訓(xùn)練集數(shù)據(jù)求解參數(shù)
實(shí)踐
發(fā)電耗煤預(yù)測
df_train = Df[ (Df['date']<'2022-01-01') & (Df['date']>='2018-01-01') ]
df_test = Df[ (Df['date']>='2022-01-01')]
def FB(data):
df = pd.DataFrame({
'ds': data.date,
'y': data.amt,
})
# df['cap'] = data.amt.values.max()
# df['floor'] = data.amt.values.min()
m = prophet.Prophet(
changepoint_prior_scale=0.05,
daily_seasonality=False,
yearly_seasonality=True, #年周期性
weekly_seasonality=True, #周周期性
# growth="logistic",
)
m.add_seasonality(name='monthly', period=30.5, fourier_order=5, prior_scale=0.1)#月周期性
m.add_country_holidays(country_name='CN')#中國所有的節(jié)假日
m.fit(df)
future = m.make_future_dataframe(periods=30, freq='D')#預(yù)測時(shí)長
# future['cap'] = data.amt.values.max()
# future['floor'] = data.amt.values.min()
forecast = m.predict(future)
fig = m.plot_components(forecast)
fig1 = m.plot(forecast)
a = add_changepoints_to_plot(fig1.gca(), m, forecast)
return forecast,m
forecast,m = FB(df_train)



def FPPredict(data,m):
df = pd.DataFrame({
'ds': data.date,
'y': data.amt,
})
df_predict = m.predict(df)
df['yhat'] = df_predict['yhat'].values
df = df.set_index('ds')
df.plot()
return df
df = FPPredict(df_test.tail(200),m)

申購贖回金額預(yù)測
kaggle notebook[1]
Purchase Redemption Data.zip
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import prophet
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
from prophet.plot import plot_cross_validation_metric
import warnings
warnings.filterwarnings('ignore')
data_user = pd.read_csv('../input/purchase-redemption/Purchase Redemption Data/user_balance_table.csv')
data_user['report_date'] = pd.to_datetime(data_user['report_date'], format='%Y%m%d')
data_user.head()
data_user_byday = data_user.groupby(['report_date'])['total_purchase_amt','total_redeem_amt'].sum().sort_values(['report_date']).reset_index()
data_user_byday.head()
申購
#定義模型
def FB(data: pd.DataFrame):
df = pd.DataFrame({
'ds': data.report_date,
'y': data.total_purchase_amt,
})
# df['cap'] = data.total_purchase_amt.values.max()
# df['floor'] = data.total_purchase_amt.values.min()
m = prophet.Prophet(
changepoint_prior_scale=0.05,
daily_seasonality=False,
yearly_seasonality=True, #年周期性
weekly_seasonality=True, #周周期性
# growth="logistic",
)
# m.add_seasonality(name='monthly', period=30.5, fourier_order=5, prior_scale=0.1)#月周期性
m.add_country_holidays(country_name='CN')#中國所有的節(jié)假日
m.fit(df)
future = m.make_future_dataframe(periods=30, freq='D')#預(yù)測時(shí)長
# future['cap'] = data.total_purchase_amt.values.max()
# future['floor'] = data.total_purchase_amt.values.min()
forecast = m.predict(future)
fig = m.plot_components(forecast)
fig1 = m.plot(forecast)
return forecast,m
result_purchase,purchase_model = FB(data_user_byday.iloc[:-30])


def FPPredict(data,m):
df = pd.DataFrame({
'ds': data.report_date,
'y': data.total_purchase_amt,
})
# df['cap'] = data.total_purchase_amt.values.max()
# df['floor'] = data.total_purchase_amt.values.min()
df_predict = m.predict(df)
df['yhat'] = df_predict['yhat'].values
df = df.set_index('ds')
df.plot()
return df
purchase_df = FPPredict(data_user_byday.iloc[-30:],purchase_model)

贖回
#定義模型
def FB(data: pd.DataFrame):
df = pd.DataFrame({
'ds': data.report_date,
'y': data.total_redeem_amt,
})
df['cap'] = data.total_purchase_amt.values.max()
df['floor'] = data.total_purchase_amt.values.min()
m = prophet.Prophet(
changepoint_prior_scale=0.05,
daily_seasonality=False,
yearly_seasonality=True, #年周期性
weekly_seasonality=True, #周周期性
growth="logistic",
)
# m.add_seasonality(name='monthly', period=30.5, fourier_order=5, prior_scale=0.1)#月周期性
m.add_country_holidays(country_name='CN')#中國所有的節(jié)假日
m.fit(df)
future = m.make_future_dataframe(periods=30, freq='D')#預(yù)測時(shí)長
future['cap'] = data.total_purchase_amt.values.max()
future['floor'] = data.total_purchase_amt.values.min()
forecast = m.predict(future)
fig = m.plot_components(forecast)
fig1 = m.plot(forecast)
return forecast
result_redeem = FB(data_user_byday)


Bonus 時(shí)間序列特征工程
https://www.heywhale.com/mw/project/63904f5658e3bea6a3e52800
EDA
import sweetviz as sv
def eda(df, name, target=None):
sweet_report = sv.analyze(df, target_feat=target)
sweet_report.show_html(f'{name}.html')
def eda_compare(df1, df2, name, feature, target):
feature_config = sv.FeatureConfig(force_text=feature, force_cat=feature)
sweet_report = sv.compare(df1, df2, feat_cfg=feature_config, target_feat=target)
sweet_report.show_html(f'{name}_compare.html')
完整版請?jiān)L問:https://www.wolai.com/stupidccl/5dqha79nnrPMf5xTAs6jUu
參考資料
kaggle notebook: https://www.kaggle.com/code/stupidccl/time-serious-analysis-1/edit/run/107631286
