Commit graph

46 commits

Author SHA1 Message Date
David Kyle
5ad1d0d2cc
Fix hardcoded version replacement in put-dfanalytics.asciidoc (#51056) 2020-01-16 10:06:45 +00:00
Przemysław Witek
999884d8fb
Add missing docs for new evaluation metrics (#50967) 2020-01-15 14:23:37 +01:00
Dimitris Athanasiou
4d2be9bd32
[ML] Add num_top_feature_importance_values param to regression and classi… (#50914)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.
2020-01-14 15:01:47 +02:00
István Zoltán Szabó
b3457154a3
[DOCS] Fine-tunes data frame analytics API docs formatting. (#50799) 2020-01-09 16:21:01 +01:00
István Zoltán Szabó
b683f96e23
[DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-09 13:57:11 +01:00
István Zoltán Szabó
bc21500201
[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-09 10:44:07 +01:00
Dimitris Athanasiou
af0ce426cc
[ML] Implement force deleting a data frame analytics job (#50553)
Adds a `force` parameter to the delete data frame analytics
request. When `force` is `true`, the action force-stops the
jobs and then proceeds to the deletion. This can be used in
order to delete a non-stopped job with a single request.

Closes #48124
2020-01-03 12:01:41 +02:00
István Zoltán Szabó
fd50169c74
[DOCS] Specifies the possible data types of classification dependent_variable (#50582) 2020-01-03 10:41:38 +01:00
István Zoltán Szabó
50e26d40a2
[DOCS] Adds GET, GET stats and DELETE inference APIs (#50224)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-18 09:10:12 +01:00
István Zoltán Szabó
3857e3d94f
[DOCS] Moves data frame analytics job resource definitions into APIs (#50021) 2019-12-12 10:59:37 +01:00
Dimitris Athanasiou
269425b54d
[ML] Introduce randomize_seed setting for regression and classification (#49990)
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.
2019-12-10 10:22:53 +02:00
Lisa Cawley
0f51bc2f72
[DOCS] Move anomaly detection job resource definitions into APIs (#49700)
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
2019-12-06 15:32:07 -08:00
István Zoltán Szabó
e5d512a8ed
[DOCS] Fixes classification evaluation example response. (#49905) 2019-12-06 13:24:22 +01:00
István Zoltán Szabó
f7a5b73972
[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831) 2019-12-05 14:15:19 +01:00
Dimitris Athanasiou
bad07b76f7
[ML] Add optional source filtering during data frame reindexing (#49690)
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531
2019-11-29 14:20:31 +02:00
lcawl
3b3f3ca925 [DOCS] Fixes typo in ML resources 2019-11-26 10:28:18 -08:00
lcawl
63b944c00f [DOCS] Fixes data type formatting 2019-11-26 08:21:39 -08:00
Dimitris Athanasiou
0390ec3627
[ML] Explain data frame analytics API (#49455)
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.
2019-11-22 20:08:14 +02:00
István Zoltán Szabó
7180b90646
[DOCS] Removes best practice about fields that are highly correlated to the dependent variable. (#48935) 2019-11-11 10:00:11 -05:00
István Zoltán Szabó
e9cec6e1f7
[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307) 2019-11-11 09:53:59 -05:00
István Zoltán Szabó
6c3fed8d4d
[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241) 2019-11-06 07:40:27 -05:00
István Zoltán Szabó
fe92cd0a26
[DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657) 2019-11-06 07:37:14 -05:00
David Roberts
fd83c18cc1
[ML] Add lazy assignment job config option (#47726)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-14 12:13:01 +01:00
István Zoltán Szabó
448d19f0ca
[DOCS] Adds supported fields section to the PUT DFA API description (#47842) 2019-10-10 12:34:39 +02:00
István Zoltán Szabó
ab08c0cd76
[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791) 2019-10-09 18:13:33 +02:00
Dimitris Athanasiou
e99435a7f6
[ML] Additional outlier detection parameters (#47600)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values
2019-10-07 15:28:21 +03:00
Lisa Cawley
4e4990c6a0
[DOCS] Cleans up links to security content (#47610) 2019-10-04 16:10:26 -07:00
István Zoltán Szabó
4977baf63a
[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)
* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.
2019-10-02 10:26:20 +02:00
István Zoltán Szabó
4073499f43
[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348) 2019-10-02 09:49:59 +02:00
István Zoltán Szabó
a6c517a96e
[DOCS] Changes wording to move away from data frame terminology in the ES repo (#47093)
* [DOCS] Changes wording to move away from data frame terminology in the ES repo.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-10-01 08:04:06 +02:00
István Zoltán Szabó
14227106b0
[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)
* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
2019-09-19 09:10:11 +02:00
István Zoltán Szabó
bd4d46c416
[DOCS] Adds outlier detection params to the data frame analytics resources (#46323)
* [DOCS] Adds outlier detection params to the data frame analytics resources.
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-09-16 14:21:50 +02:00
James Rodewig
5c78f606c2
[DOCS] Change // CONSOLE comments to [source,console] (#46440) 2019-09-09 10:45:37 -04:00
James Rodewig
e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
István Zoltán Szabó
e39cdd63c3
[DOCS] Adds progress parameter description to the GET stats data frame analytics API doc. (#46434) 2019-09-06 15:17:18 +02:00
James Rodewig
466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
István Zoltán Szabó
626bbccd6e
[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)
* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.
2019-08-29 14:38:14 +02:00
Dimitris Athanasiou
f6a97decac
[ML] Improve progress reportings for DF analytics (#45856)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 17:31:36 +03:00
Przemysław Witek
31f6e78acd
Allow the user to specify 'query' in Evaluate Data Frame request (#45775) 2019-08-22 08:27:38 +02:00
Dimitris Athanasiou
8af319481e
[ML] Add description to DF analytics (#45774) 2019-08-21 19:58:09 +03:00
Przemysław Witek
c6a25a818d
Add docs for HLRC for Estimate memory usage API (#45538) 2019-08-21 12:52:17 +02:00
Przemysław Witek
7107c221a7
Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) 2019-08-13 20:59:35 +02:00
István Zoltán Szabó
84793476ba
[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)
This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
2019-07-26 11:39:59 +02:00
Lisa Cawley
dbe7a48e82
[DOCS] Fixes query default value (#44572) 2019-07-18 08:15:28 -07:00
Lisa Cawley
4fd8e34662
[DOCS] Moves content to ML anomaly-detection folder (#44520) 2019-07-17 13:48:12 -07:00
Lisa Cawley
146be77ec3
[DOCS] Separates data frame analytics APIs (#44451)
* [DOCS] Separates data frame analytics APIs

* [DOCS] Adds links between new pages
2019-07-16 13:22:27 -07:00