【機(jī)器學(xué)習(xí)】隨機(jī)森林是我最喜歡的模型
↑↑↑點擊上方藍(lán)字,回復(fù)資料,10個G的驚喜
TensorFlow 決策森林 (TF-DF) 現(xiàn)已開源,該庫集成了眾多 SOTA 算法,不需要輸入特征,可以處理數(shù)值和分類特征,為開發(fā)者節(jié)省了大量時間。





對初學(xué)者來說,開發(fā)和解釋決策森林模型更容易。不需要顯式地列出或預(yù)處理輸入特征(因為決策森林可以自然地處理數(shù)字和分類屬性)、指定體系架構(gòu)(例如,通過嘗試不同的層組合,就像在神經(jīng)網(wǎng)絡(luò)中一樣),或者擔(dān)心模型發(fā)散。一旦你的模型經(jīng)過訓(xùn)練,你就可以直接繪制它或者用易于解釋的統(tǒng)計數(shù)據(jù)來分析它。
高級用戶將受益于推理時間非??斓哪P停ㄔ谠S多情況下,每個示例的推理時間為亞微秒)。而且,這個庫為模型實驗和研究提供了大量的可組合性。特別是,將神經(jīng)網(wǎng)絡(luò)和決策森林相結(jié)合是很容易的。

TF-DF 提供了一系列 SOTA 決策森林訓(xùn)練和服務(wù)算法,如隨機(jī)森林、CART、(Lambda)MART、DART 等。
基于樹的模型與各種 TensorFlow 工具、庫和平臺(如 TFX)更容易集成,TF-DF 庫可以作為通向豐富 TensorFlow 生態(tài)系統(tǒng)的橋梁。
對于神經(jīng)網(wǎng)絡(luò)用戶,你可以使用決策森林這種簡單的方式開始 TensorFlow,并繼續(xù)探索神經(jīng)網(wǎng)絡(luò)。

項目地址:https://github.com/tensorflow/decision-forests
TF-DF 網(wǎng)站地址:https://www.tensorflow.org/decision_forests
Google I/O 2021 地址:https://www.youtube.com/watch?v=5qgk9QJ4rdQ

# Install TensorFlow Decision Forests!pip install tensorflow_decision_forests# Load TensorFlow Decision Forestsimport tensorflow_decision_forests as tfdf# Load the training dataset using pandasimport pandastrain_df = pandas.read_csv("penguins_train.csv")# Convert the pandas dataframe into a TensorFlow datasettrain_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="species")# Train the modelmodel = tfdf.keras.RandomForestModel()model.fit(train_ds)
# Load the testing datasettest_df = pandas.read_csv("penguins_test.csv")# Convert it to a TensorFlow datasettest_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label="species")# Evaluate the modelmodel.compile(metrics=["accuracy"])print(model.evaluate(test_ds))# >> 0.979311# Note: Cross-validation would be more suited on this small dataset.# See also the "Out-of-bag evaluation" below.# Export the model to a TensorFlow SavedModelmodel.save("project/my_first_model")
tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0)
每個特性使用了多少次?
模型訓(xùn)練的速度有多快(樹的數(shù)量和時間)?
節(jié)點在樹結(jié)構(gòu)中是如何分布的(比如大多數(shù) branch 的長度)?
# Print all the available information about the modelmodel.summary()Input Features (7):>> bill_depth_mmbill_length_mmbody_mass_g>>...Variable Importance:1. "bill_length_mm" 653.000000 ################...Out-of-bag evaluation: accuracy:0.964602 logloss:0.102378Number of trees: 300Total number of nodes: 4170...# Get feature importance as a arraymodel.make_inspector().variable_importances()["MEAN_DECREASE_IN_ACCURACY"][("flipper_length_mm", 0.149),>> ("bill_length_mm", 0.096),>> ("bill_depth_mm", 0.025),>> ("body_mass_g", 0.018),>> ("island", 0.012)]
# List all the other available learning algorithmstfdf.keras.get_all_models()[tensorflow_decision_forests.keras.RandomForestModel,>> tensorflow_decision_forests.keras.GradientBoostedTreesModel,>> tensorflow_decision_forests.keras.CartModel]# Display the hyper-parameters of the Gradient Boosted Trees model? tfdf.keras.GradientBoostedTreesModelA GBT (Gradient Boosted [Decision] Tree) is a set of shallow decision trees trained sequentially. Each tree is trained to predict and then "correct" for the errors of the previously trained trees (more precisely each tree predicts the gradient of the loss relative to the model output).....Attributes:num_trees: num_trees: Maximum number of decision trees. The effective number of trained trees can be smaller if early stopping is enabled. Default: 300.max_depth: Maximum depth of the tree. `max_depth=1` means that all trees will be roots. Negative values are ignored. Default: 6....# Create another model with specified hyper-parametersmodel = tfdf.keras.GradientBoostedTreesModel(num_trees=500,growing_strategy="BEST_FIRST_GLOBAL",max_depth=8,split_axis="SPARSE_OBLIQUE",)# Evaluate the modelmodel.compile(metrics=["accuracy"])print(model.evaluate(test_ds))#0.986851
也可以加一下老胡的微信 圍觀朋友圈~~~
推薦閱讀
(點擊標(biāo)題可跳轉(zhuǎn)閱讀)
老鐵,三連支持一下,好嗎?↓↓↓
