[ML] Introduce randomize_seed setting for regression and classification (#49990)

This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training.
2025-04-24 23:27:25 -04:00 · 2019-12-10 10:22:53 +02:00 · 2019-12-10 10:22:53 +02:00 · 269425b54d
commit 269425b54d
parent a6351d63ad
24 changed files with 460 additions and 76 deletions
--- a/docs/java-rest/high-level/ml/put-data-frame-analytics.asciidoc
+++ b/docs/java-rest/high-level/ml/put-data-frame-analytics.asciidoc
@ -119,7 +119,8 @@ include-tagged::{doc-tests-file}[{api}-classification]
 <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
 <7> The name of the prediction field in the results object.
 <8> The percentage of training-eligible rows to be used in training. Defaults to 100%.
-<9> The number of top classes to be reported in the results. Defaults to 2.
+<9> The seed to be used by the random generator that picks which rows are used in training.
+<10> The number of top classes to be reported in the results. Defaults to 2.

 ===== Regression

@ -138,6 +139,7 @@ include-tagged::{doc-tests-file}[{api}-regression]
 <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
 <7> The name of the prediction field in the results object.
 <8> The percentage of training-eligible rows to be used in training. Defaults to 100%.
+<9> The seed to be used by the random generator that picks which rows are used in training.

 ==== Analyzed fields