This introduces a new `text-structure` plugin. This is the new home of the find file structure API.
The old REST URL is still available but is deprecated.
The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.
Changes to the high-level REST client and docs will be in separate commit.
related to: https://github.com/elastic/elasticsearch/issues/67001
There is little evidence of this endpoint being used
and there is quite a lot of code complexity associated
with the various formats that can be used to upload
data and the different errors that can occur when direct
data upload is open to end users.
In a future release we can make this endpoint internal
so that only datafeeds can use it, and remove all the
options and formats that are not used by datafeeds.
End users will have to store their input data for
anomaly detection in Elasticsearch indices (which we
believe all do today) and use a datafeed to feed it
to anomaly detection jobs.
This commit is fixing a potential bug if we support anomaly detection
results index rollover in the future.
In particular, we determine the current `data_counts` by sorting on the
latest record time. However, this is not correct if the job reverts
to an older model snapshot. To fix this we add `log_time` to `data_counts`
(similarly to `model_size_stats`) and sort on `log_time` to figure
out the current counts for the job.
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.
The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats. An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.
Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory. To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x. However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.
Relates #63163
This change adds an extra piece of information,
limits.total_ml_memory, to the ML info response.
This returns the total amount of memory that ML
is permitted to use for native processes across
all ML nodes in the cluster. Some of this may
already be in use; the value returned is total,
not available ML memory.
This new API provides a way for users to upgrade their own anomaly job
model snapshots.
To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
native process)
relates #64154
When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.
This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.
APIs supporting this flag:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
relates to #63055
The original comment mentioned issue #48583, but issue #48941
is specifically open for this mute. However, this is
inappropriate, as the underlying reason the test cannot be
unmuted is the same as for all the other tests skipped with the
comment "Kibana sample data": issues #51572, #51576 and #51678.
Closes#48941
This adds the new `for_export` flag to the following APIs:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
closes https://github.com/elastic/elasticsearch/issues/63055
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
Changes:
* Moves `Retrieve selected fields` to its own page and adds a title abbreviation.
* Adds existing script and stored fields content to `Retrieve selected fields`
* Adds a xref for `Retrieve selected fields` to `Search your data`
* Adds related redirects and updates existing xrefs
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.
There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.
The changes so far cover REST APIs, results object
formats, HLRC and docs.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Throttling nightly cleanup as much as we do has been over cautious.
Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.
Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.
This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.
- The default for `model_snapshot_retention_days` for new jobs is now
10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
that defaults to 1 for new jobs and `model_snapshot_retention_days`
for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
and `model_snapshot_retention_days` all but the first model snapshot
for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
selected model snapshots to be retained indefinitely
Closes#52150