mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056
190 lines
8.8 KiB
Text
190 lines
8.8 KiB
Text
--
|
|
:api: put-data-frame-analytics
|
|
:request: PutDataFrameAnalyticsRequest
|
|
:response: PutDataFrameAnalyticsResponse
|
|
--
|
|
[role="xpack"]
|
|
[id="{upid}-{api}"]
|
|
=== Put {dfanalytics-jobs} API
|
|
|
|
beta::[]
|
|
|
|
Creates a new {dfanalytics-job}.
|
|
The API accepts a +{request}+ object as a request and returns a +{response}+.
|
|
|
|
[id="{upid}-{api}-request"]
|
|
==== Put {dfanalytics-jobs} request
|
|
|
|
A +{request}+ requires the following argument:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-request]
|
|
--------------------------------------------------
|
|
<1> The configuration of the {dfanalytics-job} to create
|
|
|
|
[id="{upid}-{api}-config"]
|
|
==== {dfanalytics-cap} configuration
|
|
|
|
The `DataFrameAnalyticsConfig` object contains all the details about the {dfanalytics-job}
|
|
configuration and contains the following arguments:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-config]
|
|
--------------------------------------------------
|
|
<1> The {dfanalytics-job} ID
|
|
<2> The source index and query from which to gather data
|
|
<3> The destination index
|
|
<4> The analysis to be performed
|
|
<5> The fields to be included in / excluded from the analysis
|
|
<6> The memory limit for the model created as part of the analysis process
|
|
<7> Optionally, a human-readable description
|
|
<8> The maximum number of threads to be used by the analysis. Defaults to 1.
|
|
|
|
[id="{upid}-{api}-query-config"]
|
|
|
|
==== SourceConfig
|
|
|
|
The index and the query from which to collect data.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-source-config]
|
|
--------------------------------------------------
|
|
<1> Constructing a new DataFrameAnalyticsSource
|
|
<2> The source index
|
|
<3> The query from which to gather the data. If query is not set, a `match_all` query is used by default.
|
|
<4> Runtime mappings that will be added to the destination index mapping.
|
|
<5> Source filtering to select which fields will exist in the destination index.
|
|
|
|
===== QueryConfig
|
|
|
|
The query with which to select data from the source.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-query-config]
|
|
--------------------------------------------------
|
|
|
|
==== DestinationConfig
|
|
|
|
The index to which data should be written by the {dfanalytics-job}.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-dest-config]
|
|
--------------------------------------------------
|
|
<1> Constructing a new DataFrameAnalyticsDest
|
|
<2> The destination index
|
|
|
|
==== Analysis
|
|
|
|
The analysis to be performed.
|
|
Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+.
|
|
|
|
===== Outlier detection
|
|
|
|
+OutlierDetection+ analysis can be created in one of two ways:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-outlier-detection-default]
|
|
--------------------------------------------------
|
|
<1> Constructing a new OutlierDetection object with default strategy to determine outliers
|
|
|
|
or
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]
|
|
--------------------------------------------------
|
|
<1> Constructing a new OutlierDetection object
|
|
<2> The method used to perform the analysis
|
|
<3> Number of neighbors taken into account during analysis
|
|
<4> The min `outlier_score` required to compute feature influence
|
|
<5> Whether to compute feature influence
|
|
<6> The proportion of the data set that is assumed to be outlying prior to outlier detection
|
|
<7> Whether to apply standardization to feature values
|
|
|
|
===== Classification
|
|
|
|
+Classification+ analysis requires to set which is the +dependent_variable+ and
|
|
has a number of other optional parameters:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-classification]
|
|
--------------------------------------------------
|
|
<1> Constructing a new Classification builder object with the required dependent variable
|
|
<2> The lambda regularization parameter. A non-negative double.
|
|
<3> The gamma regularization parameter. A non-negative double.
|
|
<4> The applied shrinkage. A double in [0.001, 1].
|
|
<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
|
|
<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
|
|
<7> If set, feature importance for the top most important features will be computed.
|
|
<8> The name of the prediction field in the results object.
|
|
<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
|
|
<10> The seed to be used by the random generator that picks which rows are used in training.
|
|
<11> The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall.
|
|
<12> The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2.
|
|
<13> Custom feature processors that will create new features for analysis from the included document
|
|
fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
|
|
<14> The alpha regularization parameter. A non-negative double.
|
|
<15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
|
|
<16> The soft tree depth limit. A non-negative double.
|
|
<17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
|
|
<18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
|
|
<19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
|
|
<20> Whether to enable early stopping to finish training process if it is not finding better models.
|
|
|
|
===== Regression
|
|
|
|
+Regression+ analysis requires to set which is the +dependent_variable+ and
|
|
has a number of other optional parameters:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-regression]
|
|
--------------------------------------------------
|
|
<1> Constructing a new Regression builder object with the required dependent variable
|
|
<2> The lambda regularization parameter. A non-negative double.
|
|
<3> The gamma regularization parameter. A non-negative double.
|
|
<4> The applied shrinkage. A double in [0.001, 1].
|
|
<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
|
|
<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
|
|
<7> If set, feature importance for the top most important features will be computed.
|
|
<8> The name of the prediction field in the results object.
|
|
<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
|
|
<10> The seed to be used by the random generator that picks which rows are used in training.
|
|
<11> The loss function used for regression. Defaults to `mse`.
|
|
<12> An optional parameter to the loss function.
|
|
<13> Custom feature processors that will create new features for analysis from the included document
|
|
fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
|
|
<14> The alpha regularization parameter. A non-negative double.
|
|
<15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
|
|
<16> The soft tree depth limit. A non-negative double.
|
|
<17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
|
|
<18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
|
|
<19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
|
|
<20> Whether to enable early stopping to finish training process if it is not finding better models.
|
|
|
|
==== Analyzed fields
|
|
|
|
FetchContext object containing fields to be included in / excluded from the analysis
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-analyzed-fields]
|
|
--------------------------------------------------
|
|
|
|
include::../execution.asciidoc[]
|
|
|
|
[id="{upid}-{api}-response"]
|
|
==== Response
|
|
|
|
The returned +{response}+ contains the newly created {dfanalytics-job}.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-response]
|
|
--------------------------------------------------
|