Add MlClientDocumentationIT tests for classification. (#47569)

2025-04-24 23:27:25 -04:00 · 2019-10-11 08:21:45 +02:00 · 2019-10-11 08:21:45 +02:00 · 9b5770da0e
commit 9b5770da0e
parent 95a3da21da
4 changed files with 272 additions and 40 deletions
--- a/docs/java-rest/high-level/ml/evaluate-data-frame.asciidoc
+++ b/docs/java-rest/high-level/ml/evaluate-data-frame.asciidoc
@ -20,14 +20,52 @@ include-tagged::{doc-tests-file}[{api}-request]
 <1> Constructing a new evaluation request
 <2> Reference to an existing index
 <3> The query with which to select data from indices
-<4> Kind of evaluation to perform
-<5> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false
-<6> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive
-<7> The remaining parameters are the metrics to be calculated based on the two fields described above.
-<8> https://en.wikipedia.org/wiki/Precision_and_recall[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
-<9> https://en.wikipedia.org/wiki/Precision_and_recall[Recall] calculated at thresholds: 0.5 and 0.7
-<10> https://en.wikipedia.org/wiki/Confusion_matrix[Confusion matrix] calculated at threshold 0.5
-<11> https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated and the curve points returned
+<4> Evaluation to be performed
+
+==== Evaluation
+
+Evaluation to be performed.
+Currently, supported evaluations include: +BinarySoftClassification+, +Classification+, +Regression+.
+
+===== Binary soft classification
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-evaluation-softclassification]
+--------------------------------------------------
+<1> Constructing a new evaluation
+<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false.
+<3> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive.
+<4> The remaining parameters are the metrics to be calculated based on the two fields described above
+<5> https://en.wikipedia.org/wiki/Precision_and_recall#Precision[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
+<6> https://en.wikipedia.org/wiki/Precision_and_recall#Recall[Recall] calculated at thresholds: 0.5 and 0.7
+<7> https://en.wikipedia.org/wiki/Confusion_matrix[Confusion matrix] calculated at threshold 0.5
+<8> https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated and the curve points returned
+
+===== Classification
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-evaluation-classification]
+--------------------------------------------------
+<1> Constructing a new evaluation
+<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) class the example belongs to.
+<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) class of the example.
+<4> The remaining parameters are the metrics to be calculated based on the two fields described above
+<5> Multiclass confusion matrix of size 3
+
+===== Regression
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-evaluation-regression]
+--------------------------------------------------
+<1> Constructing a new evaluation
+<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) value for an example.
+<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) value for the example.
+<4> The remaining parameters are the metrics to be calculated based on the two fields described above
+<5> https://en.wikipedia.org/wiki/Mean_squared_error[Mean squared error]
+<6> https://en.wikipedia.org/wiki/Coefficient_of_determination[R squared]

 include::../execution.asciidoc[]

@ -41,7 +79,40 @@ The returned +{response}+ contains the requested evaluation metrics.
 include-tagged::{doc-tests-file}[{api}-response]
 --------------------------------------------------
 <1> Fetching all the calculated metrics results
-<2> Fetching precision metric by name
-<3> Fetching precision at a given (0.4) threshold
-<4> Fetching confusion matrix metric by name
-<5> Fetching confusion matrix at a given (0.5) threshold
+
+==== Results
+
+===== Binary soft classification
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-results-softclassification]
+--------------------------------------------------
+
+<1> Fetching precision metric by name
+<2> Fetching precision at a given (0.4) threshold
+<3> Fetching confusion matrix metric by name
+<4> Fetching confusion matrix at a given (0.5) threshold
+
+===== Classification
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-results-classification]
+--------------------------------------------------
+
+<1> Fetching multiclass confusion matrix metric by name
+<2> Fetching the contents of the confusion matrix
+<3> Fetching the number of classes that were not included in the matrix
+
+===== Regression
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-results-regression]
+--------------------------------------------------
+
+<1> Fetching mean squared error metric by name
+<2> Fetching the actual mean squared error value
+<3> Fetching R squared metric by name
+<4> Fetching the actual R squared value
--- a/docs/java-rest/high-level/ml/put-data-frame-analytics.asciidoc
+++ b/docs/java-rest/high-level/ml/put-data-frame-analytics.asciidoc
@ -76,7 +76,7 @@ include-tagged::{doc-tests-file}[{api}-dest-config]
 ==== Analysis

 The analysis to be performed.
-Currently, the supported analyses include : +OutlierDetection+, +Regression+.
+Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+.

 ===== Outlier detection

@ -101,6 +101,24 @@ include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]
 <6> The proportion of the data set that is assumed to be outlying prior to outlier detection
 <7> Whether to apply standardization to feature values

+===== Classification
+
+Classification+ analysis requires to set which is the +dependent_variable+ and
+has a number of other optional parameters:
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-classification]
+--------------------------------------------------
+<1> Constructing a new Classification builder object with the required dependent variable
+<2> The lambda regularization parameter. A non-negative double.
+<3> The gamma regularization parameter. A non-negative double.
+<4> The applied shrinkage. A double in [0.001, 1].
+<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
+<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
+<7> The name of the prediction field in the results object.
+<8> The percentage of training-eligible rows to be used in training. Defaults to 100%.
+
 ===== Regression

 +Regression+ analysis requires to set which is the +dependent_variable+ and