This commit adds the ability to specify exclusion patterns in Auto-Follow patterns.
This allows excluding indices that match any of the inclusion patterns and also match
some of the exclusion patterns giving more fine grained control in scenarios where this is important.
Related #67686
Backport of #72935
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.
The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.
It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.
To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.
If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.
Backport of #72805
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
In #71701 we added a new REST API that provides statistics
about the searchable snapshots cache on Frozen Tier.
This commit adds the necessary plumbing to expose this API
in the High Level REST Client. It also exposes the documentation
of the Mount Snapshot API that was created in #68949 but not
made accessible.
Backport of #71858
* [ML][HLRC] adds put and delete trained model alias APIs to rest high-level client (#69214)
adds put (and reassign) and delete trained model alias APIs to the rest high-level client.
This adds some serialization objects and request wrappers.
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.
Closes#65056
Backport of #69183
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.
Backport of #68099.
* [ML] move find file structure finder in Rest high Level client to its new endpoint and plugin (#67290)
Find file structure finder is now its own plugin, and separated from the ml plugin.
This commit updates the rest high level client to reflect this.
Additionally, this adjusts the internal and client object names from `FileStructure` to the more general `TextStructure`
This PR deprecates the usage of the id field in the payload for the
InvalidateApiKey API. The ids field introduced in #63224 is now the recommended
way for performing (bulk) API key invalidation.
This PR also includes the test fix from #66696
* [ML] add new snapshot upgrader API for upgrading older snapshots (#64665)
This new API provides a way for users to upgrade their own anomaly job
model snapshots.
To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
native process)
relates #64154
* [ML] adding for_export flag for ml plugin GET resource APIs (#63092)
This adds the new `for_export` flag to the following APIs:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
closes https://github.com/elastic/elasticsearch/issues/63055
* [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)
When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.
This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.
APIs supporting this flag:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
relates to #63055
This adds a new flag `exclude_generated` for GET transform API.
This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.
It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).
relates https://github.com/elastic/elasticsearch/issues/63055
* Adding authentication information to access token create APIs (#62490)
* Adding authentication information to access token create APIs
Adding authentication object to following APIs:
/_security/oauth2/token
/_security/delegate_pki
/_security/saml/authenticate
/_security/oidc/authenticate
Resolves: #59685
(cherry picked from commit 51dbd9e584)
* Addressing PR commends, fixing tests
* Returning tokenGroups attribute as SID string instead of byte array (AD metadata)
Addressing PR comments
* Returning tokenGroups attribute as SID string instead of byte array (AD metadata)
Update version check
* Returning tokenGroups attribute as SID string instead of byte array (AD metadata)
Update version check
* Addressing more PR comments
* Adding more to integration tests + some small fixes
* Nit fixes and formatting following #62490 comments (#63797)
* Nit fixes and formatting following #62490 comments
Resolves: #63792
* Nit fixes and formatting following #62490 comments
Resolves: #63792
* Nit fixes and formatting following #62490 comments
Fixing username
* Nit fixes and formatting following #62490 comments
Fixing formatting
* Fixing merge conflicts
* Fixing merge conflicts
Getting the API key document form the security index is the most time consuing part
of the API Key authentication flow (>60% if index is local and >90% if index is remote).
This traffic is now avoided by caching added with this PR.
Additionally, we add a cache invalidator registry so that clearing of different caches will
be managed in a single place (requires follow-up PRs).
* [ML] renames */inference* apis to */trained_models* (#63097)
This commit renames all `inference` CRUD APIs to `trained_models`.
This aligns with internal terminology, documentation, and use-cases.
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
* [ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>