<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          比賽殺器LightGBM常用操作總結(jié)!

          共 7344字,需瀏覽 15分鐘

           ·

          2020-10-28 14:27

          ?Datawhale干貨?
          作者:阿水,北京航空航天大學(xué),Datawhale成員
          LightGBM是基于XGBoost的一款可以快速并行的樹模型框架,內(nèi)部集成了多種集成學(xué)習(xí)思路,在代碼實(shí)現(xiàn)上對XGBoost的節(jié)點(diǎn)劃分進(jìn)行了改進(jìn),內(nèi)存占用更低訓(xùn)練速度更快。

          LightGBM官網(wǎng):https://lightgbm.readthedocs.io/en/latest/

          參數(shù)介紹:https://lightgbm.readthedocs.io/en/latest/Parameters.html

          本文內(nèi)容如下,原始代碼獲取方式見文末。

          • 1 安裝方法

          • 2 調(diào)用方法

            • 2.1 定義數(shù)據(jù)集

            • 2.2 模型訓(xùn)練

            • 2.3 模型保存與加載

            • 2.4 查看特征重要性

            • 2.5 繼續(xù)訓(xùn)練

            • 2.6 動(dòng)態(tài)調(diào)整模型超參數(shù)

            • 2.7 自定義損失函數(shù)

          • 3 調(diào)參方法

            • 人工調(diào)參

            • 網(wǎng)格搜索

            • 貝葉斯優(yōu)化

          1 安裝方法

          LightGBM的安裝非常簡單,在Linux下很方便的就可以開啟GPU訓(xùn)練。可以優(yōu)先選用從pip安裝,如果失敗再從源碼安裝。

          • 安裝方法:從源碼安裝
          git?clone?--recursive?https://github.com/microsoft/LightGBM?;?
          cd?LightGBM
          mkdir?build?;?cd?build
          cmake?..

          #
          ?開啟MPI通信機(jī)制,訓(xùn)練更快
          #?cmake?-DUSE_MPI=ON?..

          #
          ?GPU版本,訓(xùn)練更快
          #?cmake?-DUSE_GPU=1?..
          make?-j4
          • 安裝方法:pip安裝
          #?默認(rèn)版本
          pip?install?lightgbm

          #?MPI版本
          pip?install?lightgbm?--install-option=--mpi

          #?GPU版本
          pip?install?lightgbm?--install-option=--gpu

          2 調(diào)用方法

          在Python語言中LightGBM提供了兩種調(diào)用方式,分為為原生的API和Scikit-learn API,兩種方式都可以完成訓(xùn)練和驗(yàn)證。當(dāng)然原生的API更加靈活,看個(gè)人習(xí)慣來進(jìn)行選擇。

          2.1 定義數(shù)據(jù)集

          df_train?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train',?header=None,?sep='\t')
          df_test?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test',?header=None,?sep='\t')
          W_train?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train.weight',?header=None)[0]
          W_test?=?pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test.weight',?header=None)[0]

          y_train?=?df_train[0]
          y_test?=?df_test[0]
          X_train?=?df_train.drop(0,?axis=1)
          X_test?=?df_test.drop(0,?axis=1)
          num_train,?num_feature?=?X_train.shape

          #?create?dataset?for?lightgbm
          #?if?you?want?to?re-use?data,?remember?to?set?free_raw_data=False

          lgb_train?=?lgb.Dataset(X_train,?y_train,
          ????????????????????????weight=W_train,?free_raw_data=False)

          lgb_eval?=?lgb.Dataset(X_test,?y_test,?reference=lgb_train,
          ???????????????????????weight=W_test,?free_raw_data=False)

          2.2 模型訓(xùn)練

          params?=?{
          ????'boosting_type':?'gbdt',
          ????'objective':?'binary',
          ????'metric':?'binary_logloss',
          ????'num_leaves':?31,
          ????'learning_rate':?0.05,
          ????'feature_fraction':?0.9,
          ????'bagging_fraction':?0.8,
          ????'bagging_freq':?5,
          ????'verbose':?0
          }

          #?generate?feature?names
          feature_name?=?['feature_'?+?str(col)?for?col?in?range(num_feature)]
          gbm?=?lgb.train(params,
          ????????????????lgb_train,
          ????????????????num_boost_round=10,
          ????????????????valid_sets=lgb_train,??#?eval?training?data
          ????????????????feature_name=feature_name,
          ????????????????categorical_feature=[21])

          2.3 模型保存與加載

          #?save?model?to?file
          gbm.save_model('model.txt')

          print('Dumping?model?to?JSON...')
          model_json?=?gbm.dump_model()

          with?open('model.json',?'w+')?as?f:
          ????json.dump(model_json,?f,?indent=4)

          2.4 查看特征重要性

          #?feature?names
          print('Feature?names:',?gbm.feature_name())

          #?feature?importances
          print('Feature?importances:',?list(gbm.feature_importance()))

          2.5 繼續(xù)訓(xùn)練

          #?continue?training
          #?init_model?accepts:
          #?1.?model?file?name
          #?2.?Booster()
          gbm?=?lgb.train(params,
          ????????????????lgb_train,
          ????????????????num_boost_round=10,
          ????????????????init_model='model.txt',
          ????????????????valid_sets=lgb_eval)
          print('Finished?10?-?20?rounds?with?model?file...')

          2.6 動(dòng)態(tài)調(diào)整模型超參數(shù)

          #?decay?learning?rates
          #?learning_rates?accepts:
          #?1.?list/tuple?with?length?=?num_boost_round
          #?2.?function(curr_iter)
          gbm?=?lgb.train(params,
          ????????????????lgb_train,
          ????????????????num_boost_round=10,
          ????????????????init_model=gbm,
          ????????????????learning_rates=lambda?iter:?0.05?*?(0.99?**?iter),
          ????????????????valid_sets=lgb_eval)
          print('Finished?20?-?30?rounds?with?decay?learning?rates...')

          #?change?other?parameters?during?training
          gbm?=?lgb.train(params,
          ????????????????lgb_train,
          ????????????????num_boost_round=10,
          ????????????????init_model=gbm,
          ????????????????valid_sets=lgb_eval,
          ????????????????callbacks=[lgb.reset_parameter(bagging_fraction=[0.7]?*?5?+?[0.6]?*?5)])
          print('Finished?30?-?40?rounds?with?changing?bagging_fraction...')

          2.7 自定義損失函數(shù)

          #?self-defined?objective?function
          #?f(preds:?array,?train_data:?Dataset)?->?grad:?array,?hess:?array
          #?log?likelihood?loss
          def?loglikelihood(preds,?train_data):
          ????labels?=?train_data.get_label()
          ????preds?=?1.?/?(1.?+?np.exp(-preds))
          ????grad?=?preds?-?labels
          ????hess?=?preds?*?(1.?-?preds)
          ????return?grad,?hess

          #?self-defined?eval?metric
          #?f(preds:?array,?train_data:?Dataset)?->?name:?string,?eval_result:?float,?is_higher_better:?bool
          #?binary?error
          #?NOTE:?when?you?do?customized?loss?function,?the?default?prediction?value?is?margin
          #?This?may?make?built-in?evalution?metric?calculate?wrong?results
          #?For?example,?we?are?doing?log?likelihood?loss,?the?prediction?is?score?before?logistic?transformation
          #?Keep?this?in?mind?when?you?use?the?customization
          def?binary_error(preds,?train_data):
          ????labels?=?train_data.get_label()
          ????preds?=?1.?/?(1.?+?np.exp(-preds))
          ????return?'error',?np.mean(labels?!=?(preds?>?0.5)),?False

          gbm?=?lgb.train(params,
          ????????????????lgb_train,
          ????????????????num_boost_round=10,
          ????????????????init_model=gbm,
          ????????????????fobj=loglikelihood,
          ????????????????feval=binary_error,
          ????????????????valid_sets=lgb_eval)
          print('Finished?40?-?50?rounds?with?self-defined?objective?function?and?eval?metric...')

          2.8 調(diào)參方法

          人工調(diào)參

          For Faster Speed

          • Use bagging by setting bagging_fraction and bagging_freq
          • Use feature sub-sampling by setting feature_fraction
          • Use small max_bin
          • Use save_binary to speed up data loading in future learning
          • Use parallel learning, refer to Parallel Learning Guide <./Parallel-Learning-Guide.rst>__

          For Better Accuracy

          • Use large max_bin (may be slower)
          • Use small learning_rate with large num_iterations
          • Use large num_leaves (may cause over-fitting)
          • Use bigger training data
          • Try dart

          Deal with Over-fitting

          • Use small max_bin
          • Use small num_leaves
          • Use min_data_in_leaf and min_sum_hessian_in_leaf
          • Use bagging by set bagging_fraction and bagging_freq
          • Use feature sub-sampling by set feature_fraction
          • Use bigger training data
          • Try lambda_l1, lambda_l2 and min_gain_to_split for regularization
          • Try max_depth to avoid growing deep tree
          • Try extra_trees
          • Try increasing path_smooth

          網(wǎng)格搜索

          lg?=?lgb.LGBMClassifier(silent=False)
          param_dist?=?{"max_depth":?[4,5,?7],
          ??????????????"learning_rate"?:?[0.01,0.05,0.1],
          ??????????????"num_leaves":?[300,900,1200],
          ??????????????"n_estimators":?[50,?100,?150]
          ?????????????}

          grid_search?=?GridSearchCV(lg,?n_jobs=-1,?param_grid=param_dist,?cv?=?5,?scoring="roc_auc",?verbose=5)
          grid_search.fit(train,y_train)
          grid_search.best_estimator_,?grid_search.best_score_

          貝葉斯優(yōu)化

          import?warnings
          import?time
          warnings.filterwarnings("ignore")
          from?bayes_opt?import?BayesianOptimization
          def?lgb_eval(max_depth,?learning_rate,?num_leaves,?n_estimators):
          ????params?=?{
          ?????????????"metric"?:?'auc'
          ????????}
          ????params['max_depth']?=?int(max(max_depth,?1))
          ????params['learning_rate']?=?np.clip(0,?1,?learning_rate)
          ????params['num_leaves']?=?int(max(num_leaves,?1))
          ????params['n_estimators']?=?int(max(n_estimators,?1))
          ????cv_result?=?lgb.cv(params,?d_train,?nfold=5,?seed=0,?verbose_eval?=200,stratified=False)
          ????return?1.0?*?np.array(cv_result['auc-mean']).max()

          lgbBO?=?BayesianOptimization(lgb_eval,?{'max_depth':?(4,?8),
          ????????????????????????????????????????????'learning_rate':?(0.05,?0.2),
          ????????????????????????????????????????????'num_leaves'?:?(20,1500),
          ????????????????????????????????????????????'n_estimators':?(5,?200)},?random_state=0)

          lgbBO.maximize(init_points=5,?n_iter=50,acq='ei')
          print(lgbBO.max)


          本文代碼,可以在后臺(tái)回復(fù)【lgb】,下載本文的代碼Notebook!

          “干貨學(xué)習(xí),點(diǎn)贊三連↓
          瀏覽 150
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評論
          圖片
          表情
          推薦
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  国产精品18欠久久久久久 | 男生操女生视频网站 | 大香蕉在线大香蕉国产 | 美女靠比网站 | 91天天综合网,天天综合网 |