Commit graph

10 commits

Author SHA1 Message Date
Dimitris Athanasiou
bad07b76f7
[ML] Add optional source filtering during data frame reindexing (#49690)
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531
2019-11-29 14:20:31 +02:00
Przemysław Witek
99c912b79f
Make num_top_classes parameter's default value equal to 2 (#48119) 2019-10-17 17:59:22 +02:00
Przemysław Witek
9b5770da0e
Add MlClientDocumentationIT tests for classification. (#47569) 2019-10-11 08:21:45 +02:00
Dimitris Athanasiou
e99435a7f6
[ML] Additional outlier detection parameters (#47600)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values
2019-10-07 15:28:21 +03:00
Lisa Cawley
b1bbed84eb
[DOCS] Fixes data frame analytics job terminology in HLRC (#46758) 2019-09-16 10:00:44 -07:00
Lisa Cawley
b3dfd6e6d0
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00
Lisa Cawley
1e63105e30
[DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:26:56 -07:00
Dimitris Athanasiou
eab64250eb
[ML][HLRC] Add data frame analytics regression analysis (#46024) 2019-08-28 08:12:10 +03:00
Dimitris Athanasiou
8af319481e
[ML] Add description to DF analytics (#45774) 2019-08-21 19:58:09 +03:00
Dimitris Athanasiou
5fa36dad0b
[ML] Machine learning data frame analytics (#43544)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 10:48:27 +03:00