[ML] Add num_top_feature_importance_values param to regression and classi… (#50914)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.
This commit is contained in:
Dimitris Athanasiou 2020-01-14 15:01:47 +02:00 committed by GitHub
parent 360f954816
commit 4d2be9bd32
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
19 changed files with 266 additions and 80 deletions

View file

@ -117,10 +117,11 @@ include-tagged::{doc-tests-file}[{api}-classification]
<4> The applied shrinkage. A double in [0.001, 1].
<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
<7> The name of the prediction field in the results object.
<8> The percentage of training-eligible rows to be used in training. Defaults to 100%.
<9> The seed to be used by the random generator that picks which rows are used in training.
<10> The number of top classes to be reported in the results. Defaults to 2.
<7> If set, feature importance for the top most important features will be computed.
<8> The name of the prediction field in the results object.
<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
<10> The seed to be used by the random generator that picks which rows are used in training.
<11> The number of top classes to be reported in the results. Defaults to 2.
===== Regression
@ -137,9 +138,10 @@ include-tagged::{doc-tests-file}[{api}-regression]
<4> The applied shrinkage. A double in [0.001, 1].
<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
<7> The name of the prediction field in the results object.
<8> The percentage of training-eligible rows to be used in training. Defaults to 100%.
<9> The seed to be used by the random generator that picks which rows are used in training.
<7> If set, feature importance for the top most important features will be computed.
<8> The name of the prediction field in the results object.
<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
<10> The seed to be used by the random generator that picks which rows are used in training.
==== Analyzed fields