<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          【機(jī)器學(xué)習(xí)】你應(yīng)該知道的LightGBM各種操作!

          共 13064字,需瀏覽 27分鐘

           ·

          2022-11-01 12:35

          LightGBM是基于XGBoost的一款可以快速并行的樹模型框架,內(nèi)部集成了多種集成學(xué)習(xí)思路,在代碼實(shí)現(xiàn)上對XGBoost的節(jié)點(diǎn)劃分進(jìn)行了改進(jìn),內(nèi)存占用更低訓(xùn)練速度更快。

          LightGBM官網(wǎng):https://lightgbm.readthedocs.io/en/latest/

          參數(shù)介紹:https://lightgbm.readthedocs.io/en/latest/Parameters.html

          本文內(nèi)容如下,原始代碼獲取方式見文末。

          • 1 安裝方法

          • 2 調(diào)用方法

            • 2.1 定義數(shù)據(jù)集

            • 2.2 模型訓(xùn)練

            • 2.3 模型保存與加載

            • 2.4 查看特征重要性

            • 2.5 繼續(xù)訓(xùn)練

            • 2.6 動態(tài)調(diào)整模型超參數(shù)

            • 2.7 自定義損失函數(shù)

          • 2.8 調(diào)參方法

            • 人工調(diào)參

            • 網(wǎng)格搜索

            • 貝葉斯優(yōu)化

          1 安裝方法

          LightGBM的安裝非常簡單,在Linux下很方便的就可以開啟GPU訓(xùn)練。可以優(yōu)先選用從pip安裝,如果失敗再從源碼安裝。

          • 安裝方法:從源碼安裝

          git clone --recursive https://github.com/microsoft/LightGBM ; 
          cd LightGBM
          mkdir build ; cd build
          cmake ..

          #
           開啟MPI通信機(jī)制,訓(xùn)練更快
          # cmake -DUSE_MPI=ON ..

          #
           GPU版本,訓(xùn)練更快
          # cmake -DUSE_GPU=1 ..
          make -j4
          • 安裝方法:pip安裝
          # 默認(rèn)版本
          pip install lightgbm

          # MPI版本
          pip install lightgbm --install-option=--mpi

          # GPU版本
          pip install lightgbm --install-option=--gpu

          2 調(diào)用方法

          在Python語言中LightGBM提供了兩種調(diào)用方式,分為為原生的API和Scikit-learn API,兩種方式都可以完成訓(xùn)練和驗證。當(dāng)然原生的API更加靈活,看個人習(xí)慣來進(jìn)行選擇。

          2.1 定義數(shù)據(jù)集

          df_train = pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train', header=None, sep='\t')
          df_test = pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test', header=None, sep='\t')
          W_train = pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.train.weight', header=None)[0]
          W_test = pd.read_csv('https://cdn.coggle.club/LightGBM/examples/binary_classification/binary.test.weight', header=None)[0]

          y_train = df_train[0]
          y_test = df_test[0]
          X_train = df_train.drop(0, axis=1)
          X_test = df_test.drop(0, axis=1)
          num_train, num_feature = X_train.shape

          # create dataset for lightgbm
          # if you want to re-use data, remember to set free_raw_data=False

          lgb_train = lgb.Dataset(X_train, y_train,
                                  weight=W_train, free_raw_data=False)

          lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,
                                 weight=W_test, free_raw_data=False)

          2.2 模型訓(xùn)練

          params = {
              'boosting_type''gbdt',
              'objective''binary',
              'metric''binary_logloss',
              'num_leaves': 31,
              'learning_rate': 0.05,
              'feature_fraction': 0.9,
              'bagging_fraction': 0.8,
              'bagging_freq': 5,
              'verbose': 0
          }

          # generate feature names
          feature_name = ['feature_' + str(col) for col in range(num_feature)]
          gbm = lgb.train(params,
                          lgb_train,
                          num_boost_round=10,
                          valid_sets=lgb_train,  # eval training data
                          feature_name=feature_name,
                          categorical_feature=[21])

          2.3 模型保存與加載

          # save model to file
          gbm.save_model('model.txt')

          print('Dumping model to JSON...')
          model_json = gbm.dump_model()

          with open('model.json''w+') as f:
              json.dump(model_json, f, indent=4)

          2.4 查看特征重要性

          # feature names
          print('Feature names:', gbm.feature_name())

          # feature importances
          print('Feature importances:', list(gbm.feature_importance()))

          2.5 繼續(xù)訓(xùn)練

          # continue training
          # init_model accepts:
          # 1. model file name
          # 2. Booster()
          gbm = lgb.train(params,
                          lgb_train,
                          num_boost_round=10,
                          init_model='model.txt',
                          valid_sets=lgb_eval)
          print('Finished 10 - 20 rounds with model file...')

          2.6 動態(tài)調(diào)整模型超參數(shù)

          # decay learning rates
          # learning_rates accepts:
          # 1. list/tuple with length = num_boost_round
          # 2. function(curr_iter)
          gbm = lgb.train(params,
                          lgb_train,
                          num_boost_round=10,
                          init_model=gbm,
                          learning_rates=lambda iter: 0.05 * (0.99 ** iter),
                          valid_sets=lgb_eval)
          print('Finished 20 - 30 rounds with decay learning rates...')

          # change other parameters during training
          gbm = lgb.train(params,
                          lgb_train,
                          num_boost_round=10,
                          init_model=gbm,
                          valid_sets=lgb_eval,
                          callbacks=[lgb.reset_parameter(bagging_fraction=[0.7] * 5 + [0.6] * 5)])
          print('Finished 30 - 40 rounds with changing bagging_fraction...')

          2.7 自定義損失函數(shù)

          # self-defined objective function
          # f(preds: array, train_data: Dataset) -> grad: array, hess: array
          # log likelihood loss
          def loglikelihood(preds, train_data):
              labels = train_data.get_label()
              preds = 1. / (1. + np.exp(-preds))
              grad = preds - labels
              hess = preds * (1. - preds)
              return grad, hess

          # self-defined eval metric
          # f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool
          # binary error
          NOTE: when you do customized loss function, the default prediction value is margin
          # This may make built-in evalution metric calculate wrong results
          # For example, we are doing log likelihood loss, the prediction is score before logistic transformation
          # Keep this in mind when you use the customization
          def binary_error(preds, train_data):
              labels = train_data.get_label()
              preds = 1. / (1. + np.exp(-preds))
              return 'error', np.mean(labels != (preds > 0.5)), False

          gbm = lgb.train(params,
                          lgb_train,
                          num_boost_round=10,
                          init_model=gbm,
                          fobj=loglikelihood,
                          feval=binary_error,
                          valid_sets=lgb_eval)
          print('Finished 40 - 50 rounds with self-defined objective function and eval metric...')

          2.8 調(diào)參方法

          人工調(diào)參

          For Faster Speed

          • Use bagging by setting bagging_fraction and bagging_freq
          • Use feature sub-sampling by setting feature_fraction
          • Use small max_bin
          • Use save_binary to speed up data loading in future learning
          • Use parallel learning, refer to Parallel Learning Guide <./Parallel-Learning-Guide.rst>__

          For Better Accuracy

          • Use large max_bin (may be slower)
          • Use small learning_rate with large num_iterations
          • Use large num_leaves (may cause over-fitting)
          • Use bigger training data
          • Try dart

          Deal with Over-fitting

          • Use small max_bin
          • Use small num_leaves
          • Use min_data_in_leaf and min_sum_hessian_in_leaf
          • Use bagging by set bagging_fraction and bagging_freq
          • Use feature sub-sampling by set feature_fraction
          • Use bigger training data
          • Try lambda_l1, lambda_l2 and min_gain_to_split for regularization
          • Try max_depth to avoid growing deep tree
          • Try extra_trees
          • Try increasing path_smooth

          網(wǎng)格搜索

          lg = lgb.LGBMClassifier(silent=False)
          param_dist = {"max_depth": [4,5, 7],
                        "learning_rate" : [0.01,0.05,0.1],
                        "num_leaves": [300,900,1200],
                        "n_estimators": [50, 100, 150]
                       }

          grid_search = GridSearchCV(lg, n_jobs=-1, param_grid=param_dist, cv = 5, scoring="roc_auc", verbose=5)
          grid_search.fit(train,y_train)
          grid_search.best_estimator_, grid_search.best_score_

          貝葉斯優(yōu)化

          import warnings
          import time
          warnings.filterwarnings("ignore")
          from bayes_opt import BayesianOptimization
          def lgb_eval(max_depth, learning_rate, num_leaves, n_estimators):
              params = {
                       "metric" : 'auc'
                  }
              params['max_depth'] = int(max(max_depth, 1))
              params['learning_rate'] = np.clip(0, 1, learning_rate)
              params['num_leaves'] = int(max(num_leaves, 1))
              params['n_estimators'] = int(max(n_estimators, 1))
              cv_result = lgb.cv(params, d_train, nfold=5, seed=0, verbose_eval =200,stratified=False)
              return 1.0 * np.array(cv_result['auc-mean']).max()

          lgbBO = BayesianOptimization(lgb_eval, {'max_depth': (4, 8),
                                                      'learning_rate': (0.05, 0.2),
                                                      'num_leaves' : (20,1500),
                                                      'n_estimators': (5, 200)}, random_state=0)

          lgbBO.maximize(init_points=5, n_iter=50,acq='ei')
          print(lgbBO.max)



          往期精彩回顧




          瀏覽 111
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  777片理伦片在线观看 | 福利黄色片:片 | 2021AV视频 | 大鸡巴在线 | 亚洲AV无码成人精品涩涩麻豆 |