This changes the default value of both the
`data_streams.auto_sharding.increase_shards.load_metric` and
`data_streams.auto_sharding.decrease_shards.load_metric` cluster
settings from `PEAK` to `ALL_TIME`. This setting has been applied via
config for several weeks now.
The approach taken to updating the tests was to swap the values given for the all-time and peak loads in all the stats objects provided as input to the tests, and to swap the enum values in the couple of places they appear.
During reindexing we retrieve the index mode from the template settings. However, we do not fully resolve the settings as we do when validating a template or when creating a data stream. This results on throwing the error reported in #125607.
I do not see a reason to not fix this as suggested in #125607 (comment).
Fixes: #125607
* Initial testHealthIndicator that fails
* Refactor: FileSettingsHealthInfo record
* Propagate file settings health indicator to health node
* ensureStableCluster
* Try to induce a failure from returning node-local info
* Remove redundant node from client() call
* Use local node ID in UpdateHealthInfoCacheAction.Request
* Move logger to top
* Test node-local health on master and health nodes
* Fix calculate to use the given info
* mutateFileSettingsHealthInfo
* Test status from local current info
* FileSettingsHealthTracker
* Spruce up HealthInfoTests
* spotless
* randomNonNegativeLong
* Rename variable
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
* Address Niels' comments
* Test one- and two-node clusters
* [CI] Auto commit changes from spotless
* Ensure there's a master node
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
* setBootstrapMasterNodeIndex
---------
Co-authored-by: Niels Bauman <33722607+nielsbauman@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This PR is adding the API capability to ensure that the API tests that
check for the default failures retention will only be executed when the
version supports this. This was missed in the original PR
(https://github.com/elastic/elasticsearch/pull/127573).
We introduce a new global retention setting `data_streams.lifecycle.retention.failures_default` which is used by the data stream lifecycle management as the default retention when the failure store lifecycle of the data stream does not specify one.
Elasticsearch comes with the default value of 30 days. The value can be changed via the settings API to any time value higher than 10 seconds or -1 to indicate no default retention should apply.
The failures default retention can be set to values higher than the max retention, but then the max retention will be effective. The reason for this choice it to ensure that no deployments will be broken, if the user has already set up max retention less than 30 days.
This PR adds to the indexing write load, the time taken to flush write indexing buffers using the indexing threads (this is done here to push back on indexing)
This changes the semantics of InternalIndexingStats#recentIndexMetric and InternalIndexingStats#peakIndexMetric to more accurately account for load on the indexing thread. Address ES-11356.
The failure store is a set of data stream indices that are used to store certain type of ingestion failures. Until this moment they were sharing the configuration of the backing indices. We understand that the two data sets have different lifecycle needs.
We believe that typically the failures will need to be retained much less than the data. Considering this we believe the lifecycle needs of the failures also more limited and they fit better the simplicity of the data stream lifecycle feature.
This allows the user to only set the desired retention and we will perform the rollover and other maintenance tasks without the user having to think about them. Furthermore, having only one lifecycle management feature allows us to ensure that these data is managed by default.
This PR introduces the following:
Configuration
We extend the failure store configuration to allow lifecycle configuration too, this configuration reflects the user's configuration only as shown below:
PUT _data_stream/*/options
{
"failure_store": {
"lifecycle": {
"data_retention": "5d"
}
}
}
GET _data_stream/*/options
{
"data_streams": [
{
"name": "my-ds",
"options": {
"failure_store": {
"lifecycle": {
"data_retention": "5d"
}
}
}
}
]
}
To retrieve the effective configuration you need to use the GET data streams API, see #126668
Functionality
The data stream lifecycle (DLM) will manage the failure indices regardless if the failure store is enabled or not. This will ensure that if the failure store gets disabled we will not have stagnant data.
The data stream options APIs reflect only the user's configuration.
The GET data stream API should be used to check the current state of the effective failure store configuration.
Telemetry
We extend the data stream failure store telemetry to also include the lifecycle telemetry.
{
"data_streams": {
"available": true,
"enabled": true,
"data_streams": 10,
"indices_count": 50,
"failure_store": {
"explicitly_enabled_count": 1,
"effectively_enabled_count": 15,
"failure_indices_count": 30
"lifecycle": {
"explicitly_enabled_count": 5,
"effectively_enabled_count": 20,
"data_retention": {
"configured_data_streams": 5,
"minimum_millis": X,
"maximum_millis": Y,
"average_millis": Z,
},
"effective_retention": {
"retained_data_streams": 20,
"minimum_millis": X,
"maximum_millis": Y,
"average_millis": Z
},
"global_retention": {
"max": {
"defined": false
},
"default": {
"defined": true, <------ this is the default value applicable for the failure store
"millis": X
}
}
}
}
}
Implementation details
We ensure that partially reset failure store will create valid failure store configuration.
We ensure that when a node communicates with a note with a previous version it will ensure it will not send an invalid failure store configuration enabled: null.
We replace usages of time sensitive
`DataStream#getDefaultBackingIndexName` with the retrieval of the name
via an API call. The problem with using the time sensitive method is
that we can have test failures around midnight.
Relates #123376
This method would default to starting a new node when the cluster was
empty. This is pretty trappy as `getClient()` (or things like
`getMaster()` that depend on `getClient()`) don't look at all like
something that would start a new node.
In any case, the intention of tests is much clearer when they explicitly
define a cluster configuration.
To retrieve the effective configuration you need to use the `GET` data
streams API, for example, if a data stream has empty data stream
options, it might still have failure store enabled from a cluster
setting. The failure store is managed by default with a lifecycle with
infinite (for now) retention, so the response will look like this:
```
GET _data_stream/*
{
"data_streams": [
{
"name": "my-data-stream",
"timestamp_field": {
"name": "@timestamp"
},
.....
"failure_store": {
"enabled": true,
"lifecycle": {
"enabled": true
},
"rollover_on_write": false,
"indices": [
{
"index_name": ".fs-my-data-stream-2099.03.08-000003",
"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
"managed_by": "Data stream lifecycle"
}
]
}
},...
]
```
In case there is a failure indexed managed by ILM the failure index info
will be displayed as follows.
```
{
"index_name": ".fs-my-data-stream-2099.03.08-000002",
"index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
"prefer_ilm": true,
"ilm_policy": "my-lifecycle-policy",
"managed_by": "Index Lifecycle Management"
}
```
The class `DataStreamLifecycle` is currently capturing the lifecycle
configuration that currently manages all data stream indices, but soon
enough it will be split into two variants, the data and the failures
lifecycle.
Some pre-work has been done already but as we are progressing in our
POC, we see that it will be really useful if the `DataStreamLifecycle`
is "aware" of the target index component. This will allow us to
correctly apply global retention or to throw an error if a downsampling
configuration is provided to a failure lifecycle.
In this PR, we perform a small refactoring to reduce the noise in
https://github.com/elastic/elasticsearch/pull/125658. Here we introduce
the following:
- A factory method that creates a data lifecycle, for now it's trivial but it will be more useful soon.
- We rename the "empty" builder to explicitly mention the index component it refers to.
Adds a node feature that is conditionally added to the cluster state if the failure store
feature flag is enabled. Requires all nodes in the cluster to have the node feature
present in order to redirect failed documents to the failure store from the ingest node
or from shard level bulk failures.
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.
The following error was observed in the wild:
```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```
Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:
```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
I noticed that we tend to create the flag instance and call this method
everywhere. This doesn't compile the same way as a real boolean constant
unless you're running with `-XX:+TrustFinalNonStaticFields`.
For most of the code spots changed here that's irrelevant but at least
the usage in the mapper parsing code is a little hot and gets a small
speedup from this potentially.
Also we're simply wasting some bytes for the static footprint of ES by
using the `FeatureFlag` indirection instead of just a boolean.
In this PR we add support for the failure store for system data streams.
Specifically:
- We pass the system descriptor so the failure index can be created based on that.
- We extend the tests to ensure it works
- We remove a guard we had but I wasn't able to test it because it only gets triggered if the data stream gets created right after a failure in the ingest pipeline, and I didn't see how to add one (yet).
- We extend the system data stream migration to ensure this is also working.
This commit adds support for system data streams reindexing. The system data stream migration extends the existing system indices migration task and uses the data stream reindex API.
The system index migration task starts a reindex data stream task and tracks its status every second. Only one system index or system data stream is migrated at a time. If a data stream migration fails, the entire system index migration task will also fail.
Port of #123926
When creating the an empty lifecycle we used to use the default
constructor. This is not just for efficiency but it will allow us to
separate the default data and failures lifecycle in the future.
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
This test had a copy paste mistake. When the cluster has only one data
node the replicas cannot be assigned so we end up with a force merge
error. In the case of the failure store this was not asserted correctly.
On the other hand, this test only checked for the existence of an error
and it was not ensuring that the current error is not the rollover error
that should have recovered. We make this test a bit more explicit.
Fixes: https://github.com/elastic/elasticsearch/issues/126252
**Issue** The data stream lifecycle does not register correctly rollover
errors for failure store.
**Observed bahaviour** When data stream lifecycle encounters a rollover
error it records it unless it sees that the current write index of this
data stream doesn't match the source index of the request. However, the
write index check does not use the failure write index but the write
backing index, so the failure gets ignored
**Desired behaviour** When data stream lifecycle encounters a rollover
error it will check the relevant write index before it determines if it
should be recorded or not.
In this PR we introduce the data stream API in the `es-rest-api` using
the feature flag feature. This enabled us to use the `yamlRestTests`
tests instead of the `javaRestTests`.
Now that all actions that DLM depends on are project-aware, we can make DLM itself project-aware.
There still exists only one instance of `DataStreamLifecycleService`, it just loops over all the projects - which matches the approach we've taken for similar scenarios thus far.
This tracks the highest value seen for the recent write load metric
any time the stats for a shard was computed, exposes this value
alongside the recent value, and persists it in index metadata
alongside it too.
The new test in `IndexShardTests` is designed to more thoroughly test
the recent write load metric previously added, as well as to test the
peak metric being added here.
ES-10037 #comment Added peak load metric in https://github.com/elastic/elasticsearch/pull/125521
* Specify index component when retrieving lifecycle
* Add getters for the failure lifecycle
* Conceptually introduce the failure store lifecycle (even for now it's the same)
These tests had the potential to fail when two consecutive GET data
streams requests would hit two different nodes, where one node already
had the cluster state that contained the new backing index and the other
node didn't yet.
Caused by #122852Fixes#124882Fixes#124885
These tests had the potential to fail when two consecutive GET data
streams requests would hit two different nodes, where one node already
had the cluster state that contained the new backing index and the other
node didn't yet.
Caused by #122852Fixes#124846Fixes#124950Fixes#124999
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
This PR updates the different methods in TestProjectResolvers so that
their names are more accurate and behaviours to be more as expected.
For example, In MP-1749, we differentiate between single-project and
single-project only resolvers. The later should not support multi-project.
This is part of the work to make DLM project-aware.
These two features were pretty tightly coupled, so I saved some effort
by combining them in one PR.
This uses the recently-added `ExponentiallyWeightedMovingRate` class
to calculate a write load which favours more recent load and include
this alongside the existing unweighted all-time write load in
`IndexingStats.Stats`.
As of this change, the new load metric is not used anywhere, although
it can be retrieved with the index stats or node stats APIs.
This makes using usesDefaultDistribution in our test setup for explicit by requiring a reason why it's needed.
This is helpful as part of revisiting the need for all those usages in our code base.
This field is only used (by security) for requests, having it in responses is redundant.
Also, we have a couple of responses that are singletons/quasi-enums where setting the value
needlessly might introduce some strange contention even though it's a plain store.
This isn't just a cosmetic change. It makes it clear at compile time that each response instance
is exclusively defined by the bytes that it is read from. This makes it easier to reason about the
validity of suggested optimizations like https://github.com/elastic/elasticsearch/pull/120010
This PR adds a new MetadataDeleteDataStreamService that allows us to delete system data streams prior to a restore operation. This fixes a bug where system data streams were previously un-restorable.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805