mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 23:57:20 -04:00
In this PR we introduce the API that will expose the global retention configuration and will allow users to take advantage of it. These APIs are protected by the dedicated introduced privileges: `manage_data_stream_global_retention` or higher, which allows all operations on the global retention configuration `monitor_data_stream_retention` or higher, which allows the retrieval of the global retention configuration. This PR is the final PR that makes the global retention available for our users.
84 lines
5 KiB
Text
84 lines
5 KiB
Text
[role="xpack"]
|
|
[[data-stream-lifecycle]]
|
|
== Data stream lifecycle
|
|
|
|
preview::[]
|
|
|
|
A data stream lifecycle is the built-in mechanism data streams use to manage their lifecycle. It enables you to easily
|
|
automate the management of your data streams according to your retention requirements. For example, you could configure
|
|
the lifecycle to:
|
|
|
|
* Ensure that data indexed in the data stream will be kept at least for the retention time you defined.
|
|
* Ensure that data older than the retention period will be deleted automatically by {es} at a later time.
|
|
|
|
To achieve that, it supports:
|
|
|
|
* Automatic <<index-rollover,rollover>>, which chunks your incoming data in smaller pieces to facilitate better performance
|
|
and backwards incompatible mapping changes.
|
|
* Configurable retention, which allows you to configure the time period for which your data is guaranteed to be stored.
|
|
{es} is allowed at a later time to delete data older than this time period. Retention can be configured on the data stream level
|
|
or on a global level. Read more about the different options in this <<tutorial-manage-data-stream-retention,tutorial>>.
|
|
|
|
A data stream lifecycle also supports downsampling the data stream backing indices.
|
|
See <<data-streams-put-lifecycle-downsampling-example, the downsampling example>> for
|
|
more details.
|
|
|
|
[discrete]
|
|
[[data-streams-lifecycle-how-it-works]]
|
|
=== How does it work?
|
|
|
|
In intervals configured by <<data-streams-lifecycle-poll-interval,`data_streams.lifecycle.poll_interval`>>, {es} goes over
|
|
each data stream and performs the following steps:
|
|
|
|
1. Checks if the data stream has a data stream lifecycle configured, skipping any indices not part of a managed data stream.
|
|
2. Rolls over the write index of the data stream, if it fulfills the conditions defined by
|
|
<<cluster-lifecycle-default-rollover,`cluster.lifecycle.default.rollover`>>.
|
|
3. After an index is not the write index anymore (i.e. the data stream has been rolled over),
|
|
automatically tail merges the index. Data stream lifecycle executes a merge operation that only targets
|
|
the long tail of small segments instead of the whole shard. As the segments are organised
|
|
into tiers of exponential sizes, merging the long tail of small segments is only a
|
|
fraction of the cost of force merging to a single segment. The small segments would usually
|
|
hold the most recent data so tail merging will focus the merging resources on the higher-value
|
|
data that is most likely to keep being queried.
|
|
4. If <<data-streams-put-lifecycle-downsampling-example, downsampling>> is configured it will execute
|
|
all the configured downsampling rounds.
|
|
5. Applies retention to the remaining backing indices. This means deleting the backing indices whose
|
|
`generation_time` is longer than the effective retention period (read more about the
|
|
<<effective-retention-calculation, effective retention calculation>>). The `generation_time` is only applicable to rolled
|
|
over backing indices and it is either the time since the backing index got rolled over, or the time optionally configured
|
|
in the <<index-data-stream-lifecycle-origination-date,`index.lifecycle.origination_date`>> setting.
|
|
|
|
IMPORTANT: We use the `generation_time` instead of the creation time because this ensures that all data in the backing
|
|
index have passed the retention period. As a result, the retention period is not the exact time data gets deleted, but
|
|
the minimum time data will be stored.
|
|
|
|
NOTE: Steps `2-4` apply only to backing indices that are not already managed by {ilm-init}, meaning that these indices either do
|
|
not have an {ilm-init} policy defined, or if they do, they have <<index-lifecycle-prefer-ilm,`index.lifecycle.prefer_ilm`>>
|
|
set to `false`.
|
|
|
|
[discrete]
|
|
[[data-stream-lifecycle-configuration]]
|
|
=== Configuring data stream lifecycle
|
|
|
|
Since the lifecycle is configured on the data stream level, the process to configure a lifecycle on a new data stream and
|
|
on an existing one differ.
|
|
|
|
In the following sections, we will go through the following tutorials:
|
|
|
|
* To create a new data stream with a lifecycle, you need to add the data stream lifecycle as part of the index template
|
|
that matches the name of your data stream (see <<tutorial-manage-new-data-stream>>). When a write operation
|
|
with the name of your data stream reaches {es} then the data stream will be created with the respective data stream lifecycle.
|
|
* To update the lifecycle of an existing data stream you need to use the <<data-stream-lifecycle-api, data stream lifecycle APIs>>
|
|
to edit the lifecycle on the data stream itself (see <<tutorial-manage-existing-data-stream>>).
|
|
* Migrate an existing {ilm-init} managed data stream to Data stream lifecycle using <<tutorial-migrate-data-stream-from-ilm-to-dsl>>.
|
|
|
|
NOTE: Updating the data stream lifecycle of an existing data stream is different from updating the settings or the mapping,
|
|
because it is applied on the data stream level and not on the individual backing indices.
|
|
|
|
include::tutorial-manage-new-data-stream.asciidoc[]
|
|
|
|
include::tutorial-manage-existing-data-stream.asciidoc[]
|
|
|
|
include::tutorial-manage-data-stream-retention.asciidoc[]
|
|
|
|
include::tutorial-migrate-data-stream-from-ilm-to-dsl.asciidoc[]
|