Commit graph

190 commits

Author SHA1 Message Date
David Roberts
6a20678517
[ML] Correct the update datafeed docs (#92227) (#92229)
These docs previously implied that you could update datafeed
properties while the datafeed was running, but then would have
to stop and restart it for the changes to take effect.

In fact datafeed updates can only be made while the datafeed is
stopped (and this has been the case for many years, if not forever).
2022-12-08 05:13:24 -05:00
István Zoltán Szabó
31fe12ae76
[DOCS] Resizes anomaly detection screenshot properly. (#89544) (#89549) 2022-08-23 16:57:51 +02:00
Lisa Cawley
cb42fcb4ad
[DOCS] Typo in time functions (#87373) (#87377) 2022-06-03 09:53:12 -07:00
James Rodewig
53ed187d63
[DOCS] Fix typos (#83895) (#83975)
Co-authored-by: Tobias Stadler <ts.stadler@gmx.de>
2022-02-15 13:05:01 -05:00
István Zoltán Szabó
5ad8ea20b1
[DOCS] Fixes geo function field names. (#83198) (#83200) 2022-01-27 12:20:24 +01:00
Lisa Cawley
2fb3c46ad4
[DOCS] Add missing HTML anchors to CCR and ML (#80287) (#83182)
Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
2022-01-26 11:39:38 -08:00
István Zoltán Szabó
803c8efc58
[DOCS] Fixes field names in ML sum functions. (#83048) (#83051) 2022-01-25 15:54:19 +01:00
David Roberts
0d403365bd
[7.16] [ML] Model snapshot upgrade needs a stats endpoint (#81706)
* [7.16] [ML] Model snapshot upgrade needs a stats endpoint

Previously the ML model snapshot upgrade endpoint did not
provide a way to reliably monitor progress. This could lead
to the upgrade assistant UI thinking that a model snapshot
upgrade had finished when it actually hadn't.

This change adds a new "stats" API that allows external
interested parties to find out the status of each model
snapshot upgrade and which node (if any) each is running on.

Backport of #81641

* Fixing compilation
2021-12-14 04:50:07 -05:00
Lisa Cawley
376c499f95
[DOCS] Fixes query parameters for get buckets API (#80643) (#80917) 2021-11-22 12:18:10 -08:00
Lisa Cawley
1c855f5ed4
[DOCS] Adds missing query parameters to ML APIs (#80863) (#80910) 2021-11-22 10:09:50 -08:00
Lisa Cawley
ad2b02bcee
[DOCS] Adds missing query parameters in get influencer and get snapshot APIs (#80801) (#80844) 2021-11-18 08:52:42 -08:00
Lisa Cawley
061b6fd0e6
[DOCS] Add query parameters to update datafeed API (#80777) (#80798) 2021-11-17 08:12:15 -08:00
Lisa Cawley
0a97f7440e
[DOCS] Clarify parameters in delete expired data, forecast, and flush job APIs (#80517) (#80569) 2021-11-09 15:20:43 -08:00
Lisa Cawley
416d5bb8e6
[DOCS] Edits stop and start datafeed APIs (#80461) (#80567) 2021-11-09 15:08:52 -08:00
James Rodewig
07ac8818b6
[DOCS] Remove testenv annotations from doc snippet tests (#80023) (#80458)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
# Conflicts:
#	docs/reference/ml/df-analytics/apis/get-trained-model-deployment-stats.asciidoc
#	docs/reference/ml/df-analytics/apis/infer-trained-model-deployment.asciidoc
#	docs/reference/ml/df-analytics/apis/put-trained-model-definition-part.asciidoc
#	docs/reference/ml/df-analytics/apis/put-trained-model-vocabulary.asciidoc
#	docs/reference/ml/df-analytics/apis/start-trained-model-deployment.asciidoc
#	docs/reference/ml/df-analytics/apis/stop-trained-model-deployment.asciidoc
#	docs/reference/slm/apis/slm-delete.asciidoc
#	docs/reference/slm/apis/slm-execute-retention.asciidoc
#	docs/reference/slm/apis/slm-execute.asciidoc
#	docs/reference/slm/apis/slm-get-status.asciidoc
#	docs/reference/slm/apis/slm-get.asciidoc
#	docs/reference/slm/apis/slm-start.asciidoc
#	docs/reference/slm/apis/slm-stats.asciidoc
#	docs/reference/slm/apis/slm-stop.asciidoc
#	docs/reference/sql/endpoints/client-apps/tableau-desktop.asciidoc
#	docs/reference/sql/endpoints/client-apps/tableau-server.asciidoc
2021-11-05 19:41:54 -04:00
István Zoltán Szabó
f124a986a3
[7.16] [DOCS] Adds missing query params to GET category and GET influencer APIs (#79448) (#80430) 2021-11-05 17:27:31 +01:00
Lisa Cawley
8b7f8ee5e2
[DOCS] Adds deprecated allow_no_jobs and allow_no_datafeeds ML API properties (#80163) 2021-11-02 08:30:24 -07:00
Lisa Cawley
01c557d639
[DOCS] Fixes typo in preview datafeed API (#79863) (#79879) 2021-10-26 18:24:15 -07:00
Lisa Cawley
7beedaf7e1
[DOCS] Fixes typo in calendar API example (#78867) (#78868) 2021-10-07 18:04:46 -07:00
Lisa Cawley
6695c7ceca
[DOCS] Fixes ML get calendars API (#78808) (#78854) 2021-10-07 14:17:00 -07:00
Lisa Cawley
2dc4ee3413
[DOCS] Fixes ML get scheduled events API (#78809) (#78843) 2021-10-07 10:39:03 -07:00
Benjamin Trent
d3b68b32dc
[ML] add new default char filter first_line_with_letters for machine learning categorization (#77457) (#77503)
The char filter replaces the previous default of `first_non_blank_line`.

`first_non_blank_line` worked well to figure out what line had characters at all, but log lines
like the following were handled poorly:
```
--------------------------------------------------------------------------------

Alias 'foo' already exists and this prevents setting up ILM for logs

--------------------------------------------------------------------------------
```
When combined with the `ml_standard` tokenizer, the first line was used:
```
--------------------------------------------------------------------------------
```
This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer.

The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true).

Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens:

```
"tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"]
```
2021-09-09 11:36:55 -04:00
István Zoltán Szabó
d2e60ef987
[DOCS] Adds anomaly job health alert type docs (#76659) (#77027)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-08-30 16:22:48 +02:00
Lisa Cawley
40f72fd75c
[DOCS] Update datafeed details in ML docs (#76854) (#76948) 2021-08-25 15:15:40 -07:00
István Zoltán Szabó
ae234cdc68
[DOCS] Fixes a syntax error in datafeed runtime field example. (#76917) (#76918) 2021-08-25 12:34:47 +02:00
David Roberts
c70ba3c768
[ML] Use results retention time for deleting system annotations (#76113)
* [ML] Use results retention time for deleting system annotations

In #75617 a new setting, system_annotations_retention_days, was
added to control how long system annotations are retained for.
We now feel that this setting is redundant and that system
annotations should be retained for the same period as results.
This is intuitive and defensible, as system annotations can be
considered a type of result.

Backport of #76096

* Fix one more merge clash
2021-08-04 13:53:06 -04:00
David Roberts
17581d1232
[ML] Deleting a job now deletes the datafeed if necessary (#76064)
Previously attempting to delete a job that had a datafeed
would return an exception. However, this was unnecessarily
pedantic - the user would always want to delete both job
and datafeed together, and would react by deleting the
datafeed and then subsequently deleting the job again.

This change makes the delete job API automatically delete
a datafeed associated with the job. The same level of
force is used for this delete datafeed request as was used
on the delete job request. This means that it's possible
to force-delete an open job with a started datafeed (since
force-delete datafeed will automatically stop a started
datafeed). It's still not possible to delete an opened job
without using force.

Backport of #76010
2021-08-04 05:14:54 -04:00
James Rodewig
4d881f57e1
[DOCS] Correct spelling for geo terms (#76028) (#76032)
Changes:
* Use "geopoint" when not referring to the literal field type
* Use "geoshape" when not referring to the literal field type or query type
* Use "GeoJSON" consistently
# Conflicts:
#	docs/reference/ingest/processors/enrich.asciidoc
2021-08-03 10:08:52 -04:00
Ed Savage
582b634117
[7.x][ML] Add 'model_prune_window' field to AD job config (#75741) (#75999)
Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.

Relates to ml-cpp/#1962
Backports #75741
2021-08-03 11:57:36 +01:00
Przemysław Witek
de732a4432
[7.x] [ML] Delete expired annotations (#75617) (#75841) 2021-07-29 17:03:35 +02:00
elasticsearchmachine
29a50ae5bd
[DOCS] Fixes bulleted list in ML aggregations (#75806) (#75809)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-07-28 11:49:32 -07:00
Lisa Cawley
6d821421bb
[DOCS] Fixes nesting of datafeed config in APIs (#75502) (#75545) 2021-07-20 12:02:44 -07:00
István Zoltán Szabó
c9299e1f65
[DOCS] Adds peak_model_bytes and assignment_memory_basis to GET model snapshot API docs (#75413) (#75425) 2021-07-18 07:43:18 +02:00
Lisa Cawley
dbd97cd138
[DOCS] Anomaly detection: Visualize delayed data (#75098) (#75317) 2021-07-13 21:18:55 -04:00
Benjamin Trent
d251874910
[7.x] [ML] Add datafeed_config field to anomaly detection job configs (#75262)
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.

The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
2021-07-12 14:57:38 -04:00
Lisa Cawley
e99d91df2d
[DOCS] Add memory limit details in update job API (#74517) (#74570)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-06-24 09:08:09 -07:00
David Roberts
59c55d1c63
[ML] Closing an anomaly detection job now automatically stops its datafeed if necessary (#74416)
Previously it was a requirement of the close job API that if the
job had an associated datafeed that that datafeed was stopped
before the job could be closed. Experience has shown that this
is just a pedantic nuisance. If a user closes the job without
first stopping the datafeed then it's just a mistake, and they
then have to make two further calls, to stop the datafeed and
then attempt to close the job again.

This PR changes the behaviour so that if you ask to close a job
whose datafeed is running then the datafeed gets stopped first
as part of the same call. Datafeeds are stopped with the same
level of force as the job close request specified.

Backport of #74257
2021-06-22 17:08:36 +01:00
István Zoltán Szabó
da5201c11b
[DOCS] Clarifies terminology in Performing population analysis page. (#74237) (#74276) 2021-06-18 15:38:40 +02:00
Dimitris Athanasiou
92f7c6250a
[7.x][ML] Reset anomaly detection job API (#73908) (#74093)
Adds a new API that allows a user to reset
an anomaly detection job.

To use the API do:

```
POST _ml/anomaly_detectors/<job_id>_reset
```

The API removes all data associated to the job.
In particular, it deletes model state, results and stats.

However, job notifications and user annotations are not removed.

Also, the API can be called asynchronously by setting the parameter
`wait_for_completion` to `false` (defaults to `true`). When run
that way the API returns the task id for further monitoring.

In order to prevent the job from opening while it is resetting,
a new job field has been added called `blocked`. It is an object
that contains a `reason` and the `task_id`. `reason` can take
a value from ["delete", "reset", "revert"] as all these
operations should block the job from opening. The `task_id` is also
included in order to allow tracking the task if necessary.

Finally, this commit also sets the `blocked` field when
the revert snapshot API is called as a job should not be opened
while it is reverted to a different model snapshot.

Backport of #73908
2021-06-15 10:05:40 +03:00
ymao1
b2feedf4ee
[Docs] Update cross-document links to Kibana Alerting docs (#74034) (#74091)
* Updating cross-document links

* PR fixes
2021-06-14 12:50:31 -04:00
Benjamin Trent
43cd27d339
[ML] adding running_state to datafeed stats object (#73926) (#74002)
It is useful to know the following information when reading datafeed stats:

 - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
 - Has the datafeed processed all past data available at the time of starting.

This object is only available if the datafeed task has been created.

It has the form:

```
"running_state": {
  "is_real_time": <boolean>,
  "look_back_finished": <boolean>
}
```
2021-06-10 11:35:27 -04:00
István Zoltán Szabó
eb35869886
[DOCS] Updates datafeed related runtime field examples (#73725) (#73886) 2021-06-08 11:39:36 +02:00
Lisa Cawley
59c37a1cda
[DOCS] Adds defaults to get ML results APIs (#73540) (#73735) 2021-06-03 12:03:30 -07:00
István Zoltán Szabó
c5c8ef208c
[DOCS] Removes Kibana charts-related advise about agg interval and bucket span. (#73673) (#73677) 2021-06-02 17:16:43 +02:00
David Roberts
8cf1fdcd05
[ML] Make ml_standard tokenizer the default for new categorization jobs (#73605)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Backport of #72805
2021-06-02 07:04:16 +01:00
István Zoltán Szabó
ca98fbe744
[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483) (#72608) 2021-05-03 11:21:38 +02:00
Benjamin Trent
6ca6dd06f0
[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487) (#72549)
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.

If a node does not have a view into its native memory, we ignore it for assignment.

This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
2021-04-30 09:56:36 -04:00
István Zoltán Szabó
422679f205
[DOCS] Adds anomaly detection rule advanced settings to docs (#72072) (#72202)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-26 10:11:02 +02:00
István Zoltán Szabó
36440f1dfd
[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745) (#71821) 2021-04-19 14:16:29 +02:00
Benjamin Trent
a41e0e2625
[ML] adding ability to update runtime_mappings via datafeed config update API (#71707) (#71748)
Adds runtime_mappings as an updatable field via datafeed config update.

closes: #71702
2021-04-15 11:05:52 -04:00