[ML] Explain data frame analytics API (#49455)

This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why.
2025-06-28 17:34:17 -04:00 · 2019-11-22 20:08:14 +02:00 · 2019-11-22 20:08:14 +02:00 · 0390ec3627
commit 0390ec3627
parent 197d5e7768
46 changed files with 2315 additions and 854 deletions
--- a/docs/java-rest/high-level/ml/estimate-memory-usage.asciidoc
+++ b/docs/java-rest/high-level/ml/estimate-memory-usage.asciidoc
@ -1,36 +0,0 @@
--
-:api: estimate-memory-usage
-:request: PutDataFrameAnalyticsRequest
-:response: EstimateMemoryUsageResponse
--
-[role="xpack"]
-[id="{upid}-{api}"]
-=== Estimate memory usage API
-
-Estimates memory usage of {dfanalytics}.
-Estimation results can be used when deciding the appropriate value for `model_memory_limit` setting later on.
-
-The API accepts an +{request}+ object and returns an +{response}+.
-
-[id="{upid}-{api}-request"]
-==== Estimate memory usage request
-
-["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
-include-tagged::{doc-tests-file}[{api}-request]
--------------------------------------------------
-<1> Constructing a new request containing a {dataframe-analytics-config} for which memory usage estimation should be performed
-
-include::../execution.asciidoc[]
-
-[id="{upid}-{api}-response"]
-==== Response
-
-The returned +{response}+ contains the memory usage estimates.
-
-["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
-include-tagged::{doc-tests-file}[{api}-response]
--------------------------------------------------
-<1> Estimated memory usage under the assumption that the whole {dfanalytics} should happen in memory (i.e. without overflowing to disk).
-<2> Estimated memory usage under the assumption that overflowing to disk is allowed during {dfanalytics}.
--- a/docs/java-rest/high-level/ml/explain-data-frame-analytics.asciidoc
+++ b/docs/java-rest/high-level/ml/explain-data-frame-analytics.asciidoc
@ -0,0 +1,48 @@
+--
+:api: explain-data-frame-analytics
+:request: ExplainDataFrameAnalyticsRequest
+:response: ExplainDataFrameAnalyticsResponse
+--
+[role="xpack"]
+[id="{upid}-{api}"]
+=== Explain {dfanalytics}} API
+
+Explains the following about a {dataframe-analytics-config}:
+
+* field selection: which fields are included or not in the analysis
+* memory estimation: how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for `model_memory_limit` setting later on.
+
+The API accepts an +{request}+ object and returns an +{response}+.
+
+[id="{upid}-{api}-request"]
+==== Explain {dfanalytics} request
+
+The request can be constructed with the id of an existing {dfanalytics-job}.
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-id-request]
+--------------------------------------------------
+<1> Constructing a new request with the id of an existing {dfanalytics-job}
+
+It can also be constructed with a {dataframe-analytics-config} to explain it before creating it.
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-config-request]
+--------------------------------------------------
+<1> Constructing a new request containing a {dataframe-analytics-config}
+
+include::../execution.asciidoc[]
+
+[id="{upid}-{api}-response"]
+==== Response
+
+The returned +{response}+ contains the field selection and the memory usage estimation.
+
+["source","java",subs="attributes,callouts,macros"]
+--------------------------------------------------
+include-tagged::{doc-tests-file}[{api}-response]
+--------------------------------------------------
+<1> A list where each item explains whether a field was selected for analysis or not
+<2> The memory estimation for the {dfanalytics-job}
--- a/docs/java-rest/high-level/supported-apis.asciidoc
+++ b/docs/java-rest/high-level/supported-apis.asciidoc
@ -300,7 +300,7 @@ The Java High Level REST Client supports the following Machine Learning APIs:
 * <<{upid}-start-data-frame-analytics>>
 * <<{upid}-stop-data-frame-analytics>>
 * <<{upid}-evaluate-data-frame>>
-* <<{upid}-estimate-memory-usage>>
+* <<{upid}-explain-data-frame-analytics>>
 * <<{upid}-get-trained-models>>
 * <<{upid}-put-filter>>
 * <<{upid}-get-filters>>
@ -353,7 +353,7 @@ include::ml/delete-data-frame-analytics.asciidoc[]
 include::ml/start-data-frame-analytics.asciidoc[]
 include::ml/stop-data-frame-analytics.asciidoc[]
 include::ml/evaluate-data-frame.asciidoc[]
-include::ml/estimate-memory-usage.asciidoc[]
+include::ml/explain-data-frame-analytics.asciidoc[]
 include::ml/get-trained-models.asciidoc[]
 include::ml/put-filter.asciidoc[]
 include::ml/get-filters.asciidoc[]