* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
In this PR we introduce a new query parameter behind the failure store feature flag. The query param, `faliure_store` allows the multi-syntax supporting APIs to choose the failure store indices as well. If an API should not support failure store, the `allowFailureStore` flag should be `false`.
**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`.
**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security
Rel: https://github.com/elastic/platform-docs-team/issues/208
Update the ml and transform reference documentation to provide information regarding the new versioning schemes independent from the product versions.
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
This PR adds a new field, `_meta`, to the data frame
analytics configuration.
The `_meta` field stores an arbitrary key-value map.
Keys are strings. Values are arbitrary objects
(possibly also maps).
The `_meta` field can be updated using the data frame
analytics `_update` endpoint.
The companion PR to elastic/ml-cpp#2440 adds processing of multimodal_distribution field in the anomaly score explanation. I added a changelog entry in the ml-cpp PR hence I mark this PR as a non-issue.
These docs previously implied that you could update datafeed
properties while the datafeed was running, but then would have
to stop and restart it for the changes to take effect.
In fact datafeed updates can only be made while the datafeed is
stopped (and this has been the case for many years, if not forever).
Currently there is no way to remove user-added annotations when a job is deleted or reset.
This change adds an option - delete_user_annotations - to both the delete and reset job APIs.
The default value is false, to keep the behaviour of these calls as it is currently.
This PR surfaces new information about the impact of the factors on the initial anomaly score in the anomaly record:
- single bucket impact is determined by the deviation between actual and typical in the current bucket
- multi-bucket impact is determined by the deviation between actual and typical in the past 12 buckets
- anomaly characteristics are statistical properties of the current anomaly compared to the historical observations
- high variance penalty is the reduction of anomaly score in the buckets with large confidence intervals.
- incomplete bucket penalty is the reduction of anomaly score in the buckets with fewer samples than historically expected.
Additionally, we compute lower- and upper-confidence bounds and the typical value for the anomaly records. This improves the explainability of the cases where the model plot is not activated with only a slight overhead in performance (1-2%).
n larger clusters with complicated datafeed requirements, being able to preview only a specific window of time is important. Previously, datafeed previews would always start at 0 (or from the beginning of the data). This causes issues if the index pattern contains indices on slower hardware, but when the datafeed is actually started, the "start" time is set to more recent data (and thus on faster hardware).
Additionally, when _preview is unbounded (as before), it attempts to only preview indices that are NOT frozen or cold. This is done through a query against the _tier field. Meaning, it only effects newer indices that actually have that field set.
This commit adds `search_interval` to the datafeed stats API
`running_state` object. When the datafeed is running, it reports
the last search interval that was searched. It is useful to
understand the point in time where the datafeed is currently
searching.
Closes#82405
For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater.
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Previously the ML model snapshot upgrade endpoint did not
provide a way to reliably monitor progress. This could lead
to the upgrade assistant UI thinking that a model snapshot
upgrade had finished when it actually hadn't.
This change adds a new "stats" API that allows external
interested parties to find out the status of each model
snapshot upgrade and which node (if any) each is running on.
Fixes#81519
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
The char filter replaces the previous default of `first_non_blank_line`.
`first_non_blank_line` worked well to figure out what line had characters at all, but log lines
like the following were handled poorly:
```
--------------------------------------------------------------------------------
Alias 'foo' already exists and this prevents setting up ILM for logs
--------------------------------------------------------------------------------
```
When combined with the `ml_standard` tokenizer, the first line was used:
```
--------------------------------------------------------------------------------
```
This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer.
The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true).
Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens:
```
"tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"]
```
In #75617 a new setting, system_annotations_retention_days, was
added to control how long system annotations are retained for.
We now feel that this setting is redundant and that system
annotations should be retained for the same period as results.
This is intuitive and defensible, as system annotations can be
considered a type of result.
Followup to #75617
Previously attempting to delete a job that had a datafeed
would return an exception. However, this was unnecessarily
pedantic - the user would always want to delete both job
and datafeed together, and would react by deleting the
datafeed and then subsequently deleting the job again.
This change makes the delete job API automatically delete
a datafeed associated with the job. The same level of
force is used for this delete datafeed request as was used
on the delete job request. This means that it's possible
to force-delete an open job with a started datafeed (since
force-delete datafeed will automatically stop a started
datafeed). It's still not possible to delete an opened job
without using force.
Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.
Relates to ml-cpp/#1962