總結(jié):DCIC算法分析賽完整方案分享!
DCIC2020

賽題說(shuō)明:出租車作為城市客運(yùn)交通系統(tǒng)的重要組成部分,以高效、便捷、靈活等優(yōu)點(diǎn)深受居民青睞。出租車每天的運(yùn)營(yíng)中會(huì)產(chǎn)生大量的上下車點(diǎn)位相關(guān)信息,對(duì)這些數(shù)據(jù)進(jìn)行科學(xué)合理的關(guān)聯(lián)和挖掘,對(duì)比在工作日以及休息日、節(jié)假日的出租車數(shù)據(jù)的空間分布及其動(dòng)態(tài)變化,對(duì)出租車候車泊位、管理調(diào)度和居民通勤特征的研究具有重要意義。
出租車/網(wǎng)約車:上下車地點(diǎn)挖掘;
出租車/網(wǎng)約車:不同日期的空間變化;
出租車/網(wǎng)約車:泊車和調(diào)度問(wèn)題;
統(tǒng)計(jì)分析方法分別對(duì)所提供的巡游車和網(wǎng)約車運(yùn)營(yíng)的時(shí)間、空間分布特征進(jìn)行量化計(jì)算,包括:
計(jì)算2年的每年工作日取日平均,非工作日取日平均和節(jié)假日取日平均,三種類型各自平均的時(shí)變分布變化,三種時(shí)間類型按網(wǎng)格劃分的平均空間分布(網(wǎng)格劃分顆粒度選手自選);
并分別對(duì)比分析所提供的網(wǎng)約車、巡游車,計(jì)算2年每年按工作日取日平均,非工作日取日平均和節(jié)假日取日平均三種類型的日均空駛率、訂單平均運(yùn)距、訂單平均運(yùn)行時(shí)長(zhǎng)、上下客點(diǎn)分布密度等時(shí)變特性;
根據(jù)巡游車和網(wǎng)約車的時(shí)空運(yùn)營(yíng)特征,并嘗試對(duì)巡游車與網(wǎng)約車的融合發(fā)展提出相關(guān)建議。在分析過(guò)程,參賽者必須用到但不局限于提供的數(shù)據(jù),可自行加入自有數(shù)據(jù)進(jìn)行參賽,但需說(shuō)明自帶數(shù)據(jù)來(lái)源并保證數(shù)據(jù)合法合規(guī)使用;

通過(guò)賽題理解&數(shù)據(jù)分析,參賽選手需要回答上述問(wèn)題:
每年工作日取日平均,非工作日取日平均和節(jié)假日取日平均,三種情況下出租車&網(wǎng)約車:
運(yùn)營(yíng)時(shí)間規(guī)律:出車時(shí)間和運(yùn)行時(shí)間;
空間分布規(guī)律:城市分布規(guī)律,訂單分布規(guī)律;
日均空駛率:空駛里程(沒有載客)在車輛總運(yùn)行里程中所占的比例;
訂單平均運(yùn)距:訂單平均距離計(jì)算;
訂單平均運(yùn)行時(shí)長(zhǎng):訂單平時(shí)時(shí)長(zhǎng)計(jì)算;
上下客點(diǎn)分布密度:上下車位置分布;
對(duì)出租車&網(wǎng)約車的調(diào)度、融合發(fā)展提出建議:
如何進(jìn)行訂單調(diào)度?識(shí)別打不到車的位置;
如何進(jìn)行停車場(chǎng)推薦?
訂單差異性分析?
比賽數(shù)據(jù)說(shuō)明(點(diǎn)擊閱讀原文即可直達(dá)):
https://data.xm.gov.cn/opendata-competition/#/contest_explain
賽題數(shù)據(jù)基本可以分為四類:
巡游車GPS數(shù)據(jù)(2019年、2020年);
巡游車訂單數(shù)據(jù)(2019年、2020年);
網(wǎng)約車GPS數(shù)據(jù)(2019年、2020年);
網(wǎng)約車訂單數(shù)據(jù)(2019年、2020年);
數(shù)據(jù)字段說(shuō)明如下:

城市巡游車與網(wǎng)約車運(yùn)營(yíng)特征對(duì)比分析賽題,提供 2019.05.31-2019.06.09 和2020.06.18-2020.06.27 兩年共計(jì) 20 天的 A 城市網(wǎng)約車和巡游車的 GPS 數(shù)據(jù)、訂 單數(shù)據(jù),以及 A 城市路網(wǎng)矢量數(shù)據(jù),上億條數(shù)據(jù)。

計(jì)算巡游車日均空駛率、運(yùn)距和運(yùn)行時(shí)長(zhǎng);
計(jì)算網(wǎng)約車日均空駛率、運(yùn)距和運(yùn)行時(shí)長(zhǎng);
import pandas as pdimport numpy as npimport glob# 網(wǎng)約車計(jì)算def cal_wyc(df):df = df[['DEST_TIME', 'DEP_TIME', 'WAIT_MILE', 'DRIVE_MILE']].dropna()if df['DEST_TIME'].dtype != np.int64:df = df[df['DEST_TIME'].apply(len) == 14]df = df[df['DEST_TIME'].apply(lambda x: x.isdigit())]df['DEP_TIME'] = pd.to_datetime(df['DEP_TIME'], format='%Y%m%d%H%M%S')df['DEST_TIME'] = pd.to_datetime(df['DEST_TIME'], format='%Y%m%d%H%M%S')df = df[df['DRIVE_MILE'].apply(lambda x: '-' not in str(x) and '|' not in str(x) and'路' not in str(x))]df['DRIVE_MILE'] = df['DRIVE_MILE'].astype(float)df['WAIT_MILE'] = df['WAIT_MILE'].astype(float)# return dfprint('空駛率:', (df['WAIT_MILE'] / (df['WAIT_MILE'] + df['DRIVE_MILE'] + 0.01)).mean())print('訂單平均距離:', df['DRIVE_MILE'].dropna().mean())print('訂單平均時(shí)長(zhǎng):', ((df['DEST_TIME'] - df['DEP_TIME']).dt.seconds / 60.0).mean())# 巡游車計(jì)算def cal_taxi(df):df['GETON_DATE'] = pd.to_datetime(df['GETON_DATE'])df['GETOFF_DATE'] = pd.to_datetime(df['GETOFF_DATE'])??? print('空駛率:', (df['NOPASS_MILE']?/?(df['NOPASS_MILE']?+ df['PASS_MILE'])).mean())print('訂單平均距離:', df['PASS_MILE'].mean())print('訂單平均時(shí)長(zhǎng):', ((df['GETOFF_DATE'] - df['GETON_DATE']).dt.seconds / 60.0).mean())
2019年端午節(jié)數(shù)據(jù):
INPUT_PATH = '../input/'df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['taxiOrder20190607.csv','taxiOrder20190608.csv','taxiOrder20190609.csv']])cal_taxi(df)INPUT_PATH = '../input/'df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['wycOrder20190607.csv','wycOrder20190608.csv','wycOrder20190609.csv']])cal_wyc(df)
出租車
空駛率:0.2997949500443629
訂單平均距離:6.501010225346955
訂單平均時(shí)長(zhǎng):13.055927380570695
網(wǎng)約車
空駛率:0.056048033587246776
訂單平均距離:9.065422897306478
訂單平均時(shí)長(zhǎng):111.21042580624874
2019年工作日數(shù)據(jù):
INPUT_PATH = '../input/'df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['taxiOrder20190531.csv','taxiOrder20190603.csv','taxiOrder20190604.csv','taxiOrder20190605.csv','taxiOrder20190606.csv']])cal_taxi(df)INPUT_PATH = '../input/'df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['wycOrder20190531.csv','wycOrder20190603.csv','wycOrder20190604.csv','wycOrder20190605.csv','wycOrder20190606.csv']])cal_wyc(df)出租車
空駛率:0.28597477408680505
訂單平均距離:6.463312988754979
訂單平均時(shí)長(zhǎng):13.897280639095992
網(wǎng)約車
空駛率:0.0451398589440301
訂單平均距離:8.678716893803035
訂單平均時(shí)長(zhǎng):113.34003128482045
2019年周末數(shù)據(jù):
INPUT_PATH = '../input/'df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['taxiOrder20190601.csv','taxiOrder20190602.csv',]])cal_taxi(df)df = taxiorder2019 = pd.concat([pd.read_csv(INPUT_PATH + x) for x in ['wycOrder20190601.csv','wycOrder20190602.csv',]])cal_wyc(df)
出租車
空駛率:0.2871319581401905
訂單平均距離:6.289113628823901
訂單平均時(shí)長(zhǎng):13.1542066691464
巡游車
空駛率:0.049881413163707276
訂單平均距離:8.514400548965787
訂單平均時(shí)長(zhǎng):113.50896480737183
需要注意2020年數(shù)據(jù)與上面計(jì)算邏輯相同,只需要修改下具體的文件名就可以完成計(jì)算。是不是很簡(jiǎn)單?

評(píng)分注意事項(xiàng)如下:

點(diǎn)擊「閱讀原文」實(shí)踐
