[ML] Explain data frame analytics API (#49455)

This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.
This commit is contained in:
Dimitris Athanasiou 2019-11-22 20:08:14 +02:00 committed by GitHub
parent 197d5e7768
commit 0390ec3627
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
46 changed files with 2315 additions and 854 deletions

View file

@ -1,36 +0,0 @@
--
:api: estimate-memory-usage
:request: PutDataFrameAnalyticsRequest
:response: EstimateMemoryUsageResponse
--
[role="xpack"]
[id="{upid}-{api}"]
=== Estimate memory usage API
Estimates memory usage of {dfanalytics}.
Estimation results can be used when deciding the appropriate value for `model_memory_limit` setting later on.
The API accepts an +{request}+ object and returns an +{response}+.
[id="{upid}-{api}-request"]
==== Estimate memory usage request
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request]
--------------------------------------------------
<1> Constructing a new request containing a {dataframe-analytics-config} for which memory usage estimation should be performed
include::../execution.asciidoc[]
[id="{upid}-{api}-response"]
==== Response
The returned +{response}+ contains the memory usage estimates.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response]
--------------------------------------------------
<1> Estimated memory usage under the assumption that the whole {dfanalytics} should happen in memory (i.e. without overflowing to disk).
<2> Estimated memory usage under the assumption that overflowing to disk is allowed during {dfanalytics}.

View file

@ -0,0 +1,48 @@
--
:api: explain-data-frame-analytics
:request: ExplainDataFrameAnalyticsRequest
:response: ExplainDataFrameAnalyticsResponse
--
[role="xpack"]
[id="{upid}-{api}"]
=== Explain {dfanalytics}} API
Explains the following about a {dataframe-analytics-config}:
* field selection: which fields are included or not in the analysis
* memory estimation: how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for `model_memory_limit` setting later on.
The API accepts an +{request}+ object and returns an +{response}+.
[id="{upid}-{api}-request"]
==== Explain {dfanalytics} request
The request can be constructed with the id of an existing {dfanalytics-job}.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-id-request]
--------------------------------------------------
<1> Constructing a new request with the id of an existing {dfanalytics-job}
It can also be constructed with a {dataframe-analytics-config} to explain it before creating it.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-config-request]
--------------------------------------------------
<1> Constructing a new request containing a {dataframe-analytics-config}
include::../execution.asciidoc[]
[id="{upid}-{api}-response"]
==== Response
The returned +{response}+ contains the field selection and the memory usage estimation.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response]
--------------------------------------------------
<1> A list where each item explains whether a field was selected for analysis or not
<2> The memory estimation for the {dfanalytics-job}

View file

@ -300,7 +300,7 @@ The Java High Level REST Client supports the following Machine Learning APIs:
* <<{upid}-start-data-frame-analytics>>
* <<{upid}-stop-data-frame-analytics>>
* <<{upid}-evaluate-data-frame>>
* <<{upid}-estimate-memory-usage>>
* <<{upid}-explain-data-frame-analytics>>
* <<{upid}-get-trained-models>>
* <<{upid}-put-filter>>
* <<{upid}-get-filters>>
@ -353,7 +353,7 @@ include::ml/delete-data-frame-analytics.asciidoc[]
include::ml/start-data-frame-analytics.asciidoc[]
include::ml/stop-data-frame-analytics.asciidoc[]
include::ml/evaluate-data-frame.asciidoc[]
include::ml/estimate-memory-usage.asciidoc[]
include::ml/explain-data-frame-analytics.asciidoc[]
include::ml/get-trained-models.asciidoc[]
include::ml/put-filter.asciidoc[]
include::ml/get-filters.asciidoc[]