Commit graph

151 commits

Author SHA1 Message Date
ymao1
b2feedf4ee
[Docs] Update cross-document links to Kibana Alerting docs (#74034) (#74091)
* Updating cross-document links

* PR fixes
2021-06-14 12:50:31 -04:00
Benjamin Trent
43cd27d339
[ML] adding running_state to datafeed stats object (#73926) (#74002)
It is useful to know the following information when reading datafeed stats:

 - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
 - Has the datafeed processed all past data available at the time of starting.

This object is only available if the datafeed task has been created.

It has the form:

```
"running_state": {
  "is_real_time": <boolean>,
  "look_back_finished": <boolean>
}
```
2021-06-10 11:35:27 -04:00
István Zoltán Szabó
eb35869886
[DOCS] Updates datafeed related runtime field examples (#73725) (#73886) 2021-06-08 11:39:36 +02:00
Lisa Cawley
59c37a1cda
[DOCS] Adds defaults to get ML results APIs (#73540) (#73735) 2021-06-03 12:03:30 -07:00
István Zoltán Szabó
c5c8ef208c
[DOCS] Removes Kibana charts-related advise about agg interval and bucket span. (#73673) (#73677) 2021-06-02 17:16:43 +02:00
David Roberts
8cf1fdcd05
[ML] Make ml_standard tokenizer the default for new categorization jobs (#73605)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Backport of #72805
2021-06-02 07:04:16 +01:00
István Zoltán Szabó
ca98fbe744
[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483) (#72608) 2021-05-03 11:21:38 +02:00
Benjamin Trent
6ca6dd06f0
[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487) (#72549)
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.

If a node does not have a view into its native memory, we ignore it for assignment.

This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
2021-04-30 09:56:36 -04:00
István Zoltán Szabó
422679f205
[DOCS] Adds anomaly detection rule advanced settings to docs (#72072) (#72202)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-26 10:11:02 +02:00
István Zoltán Szabó
36440f1dfd
[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745) (#71821) 2021-04-19 14:16:29 +02:00
Benjamin Trent
a41e0e2625
[ML] adding ability to update runtime_mappings via datafeed config update API (#71707) (#71748)
Adds runtime_mappings as an updatable field via datafeed config update.

closes: #71702
2021-04-15 11:05:52 -04:00
István Zoltán Szabó
5cafd48d73
[DOCS] Clarifies that custom rules are job rules in Kibana (#71678) (#71722)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-15 09:52:16 +02:00
James Rodewig
c757f9e4e7
[DOCS] Fix double spaces (#71082) (#71120) 2021-03-31 11:43:34 -04:00
Benjamin Trent
abb182d95c
[7.x] [ML] adding support for composite aggs in anomaly detection (#69970) (#71052)
* [ML] adding support for composite aggs in anomaly detection (#69970)

This commit allows for composite aggregations in datafeeds.

Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations.

The restrictions for this support are as follows:

- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).

Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary.
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.
2021-03-30 12:04:54 -04:00
Benjamin Trent
2545cc8ec4
[ML] Allow datafeed and job configs for datafeed preview API (#70836) (#70927)
Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them.

closes https://github.com/elastic/elasticsearch/issues/70264
2021-03-26 13:58:52 -04:00
István Zoltán Szabó
bc035168f2
[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486) (#70576)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-03-22 14:24:23 +01:00
James Rodewig
302341a526
[DOCS] Replace put with create or update in API names (#70330) (#70421)
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-03-15 17:16:13 -04:00
Lisa Cawley
bf8024b9e7
[DOCS] Edits machine learning settings (#69947) (#70174)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-03-09 11:35:29 -08:00
István Zoltán Szabó
24bf46dcf9
[DOCS] Expands anomaly detection alert type docs (#70026) (#70137)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>
2021-03-09 16:39:38 +01:00
István Zoltán Szabó
e4f74c8b5b
[DOCS] Adds beta tag to anomaly detection alert docs. (#70013) (#70054) 2021-03-08 12:11:58 +01:00
István Zoltán Szabó
4088f85002
[DOCS] Adds anomaly detection alert documentation (#68923) (#69417)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-02-23 10:48:24 +01:00
Dan Hermann
b311906009
Unmute more memory-related tests after the fix in #68542 2021-02-19 08:06:50 -06:00
James Rodewig
b55249507e
[DOCS] Fix typos for duplicate words (#69125) (#69132) 2021-02-17 11:16:58 -05:00
Benjamin Trent
88a1bfa516
[ML] [DOCS] update find-structure reference docs (#67586) (#67592)
The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".
2021-01-15 13:22:59 -05:00
Lisa Cawley
da5662495b
[DOCS] Move find file structure to a new API endpoint (#67314) (#67389) 2021-01-12 12:41:39 -08:00
Benjamin Trent
65690eef67
[7.x] [ML] move find file structure to a new API endpoint (#67123) (#67251)
* [ML] move find file structure to a new API endpoint (#67123)

This introduces a new `text-structure` plugin. This is the new home of the find file structure API.

The old REST URL is still available but is deprecated.

The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.

Changes to the high-level REST client and docs will be in separate commit.

related to: https://github.com/elastic/elasticsearch/issues/67001
2021-01-11 10:39:39 -05:00
Lisa Cawley
2cbb6123db
[DOCS] Clarify impact of delayed data in anomaly detection (#66816) (#67039)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-05 13:09:30 -08:00
Gordon Brown
09f823b1db
Mute tests failing on Debian 8 due to memory reporting (#66648)
See https://github.com/elastic/elasticsearch/issues/66629 for details
2020-12-18 15:15:43 -07:00
Dimitris Athanasiou
97cfc8fdf8
[7.x][ML] Add log_time to AD data_counts and decide current based on it (#66343) (#66384)
This commit is fixing a potential bug if we support anomaly detection
results index rollover in the future.

In particular, we determine the current `data_counts` by sorting on the
latest record time. However, this is not correct if the job reverts
to an older model snapshot. To fix this we add `log_time` to `data_counts`
(similarly to `model_size_stats`) and sort on `log_time` to figure
out the current counts for the job.

Backport of #66343
2020-12-16 12:27:32 +02:00
David Roberts
8284b93dcb
[ML] Deprecate anomaly detection post data endpoint (#66398)
There is little evidence of this endpoint being used
and there is quite a lot of code complexity associated
with the various formats that can be used to upload
data and the different errors that can occur when direct
data upload is open to end users.

In a future release we can make this endpoint internal
so that only datafeeds can use it, and remove all the
options and formats that are not used by datafeeds.

End users will have to store their input data for
anomaly detection in Elasticsearch indices (which we
believe all do today) and use a datafeed to feed it
to anomaly detection jobs.

Backport of #66347
2020-12-15 21:39:47 +00:00
István Zoltán Szabó
80ffeabbbc
[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085) (#66089) 2020-12-09 11:48:16 +01:00
István Zoltán Szabó
b1c11ebdbe
[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036) (#66084) 2020-12-09 10:23:42 +01:00
David Kyle
5fec2538ca
[ML] Docs and HRLC for datafeed runtime mappings (#65810) (#66007)
For the changes in #65606
2020-12-08 11:04:21 +00:00
David Roberts
cbb3886bbb
[ML] Adding assignment_memory_basis to model_size_stats (#65875)
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.

The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats.  An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.

Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory.  To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x.  However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.

Backport of #65561
2020-12-04 11:34:36 +00:00
István Zoltán Szabó
ee114e7c90
[DOCS] Fixes typo in Aggregating data for faster performance. (#65354) (#65356) 2020-11-23 13:03:20 +01:00
István Zoltán Szabó
53c64d594b
[DOCS] Adds UI related limitation to configuring aggs docs (#65184) (#65327)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-11-23 09:40:26 +01:00
István Zoltán Szabó
971b6d95c4
[DOCS] Makes the screenshot larger on the custom URLs page. (#65269) (#65297) 2020-11-20 15:25:15 +01:00
David Roberts
d8f549c21f
[ML] Add total ML memory to ML info (#65214)
This change adds an extra piece of information,
limits.total_ml_memory, to the ML info response.
This returns the total amount of memory that ML
is permitted to use for native processes across
all ML nodes in the cluster.  Some of this may
already be in use; the value returned is total,
not available ML memory.

Backport of #65195
2020-11-18 18:19:37 +00:00
Lisa Cawley
a5f4da693b
[DOCS] Adds new snapshot upgrade API (#65095) (#65100) 2020-11-17 12:21:01 -08:00
Benjamin Trent
39f5f39dc2
[7.x] [ML] add new snapshot upgrader API for upgrading older snapshots (#64665) (#65010)
* [ML] add new snapshot upgrader API for upgrading older snapshots (#64665)

This new API provides a way for users to upgrade their own anomaly job
model snapshots.

To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
  native process)

relates #64154
2020-11-17 11:30:47 -05:00
István Zoltán Szabó
ea9022551c
[DOCS] Fixes example aggregation syntax in datafeed aggregations. (#64936) (#64942) 2020-11-11 17:45:30 +01:00
James Rodewig
aea83909d9
[DOCS] Fix case for 'Boolean' (#64299) (#64341) 2020-10-29 10:04:20 -04:00
Benjamin Trent
b9dc522cb4
[7.x] [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)(#63092) (#63177)
* [ML] adding for_export flag for ml plugin GET resource APIs (#63092)

This adds the new `for_export` flag to the following APIs:

- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

closes https://github.com/elastic/elasticsearch/issues/63055

* [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)

When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.

This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.

APIs supporting this flag:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

relates to #63055
2020-10-20 12:42:52 -04:00
David Roberts
076ddbf7e1 [ML] Change docs test mute comment (#63866)
The original comment mentioned issue #48583, but issue #48941
is specifically open for this mute.  However, this is
inappropriate, as the underlying reason the test cannot be
unmuted is the same as for all the other tests skipped with the
comment "Kibana sample data": issues #51572, #51576 and #51678.

Closes #48941
2020-10-19 10:27:52 +01:00
Lisa Cawley
4de6104dae
[DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
Benjamin Trent
0f142c6afc
[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563) (#62629)
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.

- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
2020-09-18 11:06:07 -04:00
David Roberts
969a1c558b [ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-10 09:33:42 +01:00
Lisa Cawley
2789b8e6c4
[DOCS] Refresh machine learning custom URL example (#61826) (#61950) 2020-09-04 09:44:55 -07:00
Lisa Cawley
6d6f5d4acc [DOCS] Per-partition categorization (#61506) 2020-08-26 17:10:01 -07:00
lcawl
5fa839b906 [DOCS] Fix typo in update anomaly detection job API 2020-08-25 17:13:38 -07:00