mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
82 lines
5 KiB
Text
82 lines
5 KiB
Text
[role="xpack"]
|
|
[[data-stream-lifecycle]]
|
|
== Data stream lifecycle
|
|
|
|
A data stream lifecycle is the built-in mechanism data streams use to manage their lifecycle. It enables you to easily
|
|
automate the management of your data streams according to your retention requirements. For example, you could configure
|
|
the lifecycle to:
|
|
|
|
* Ensure that data indexed in the data stream will be kept at least for the retention time you defined.
|
|
* Ensure that data older than the retention period will be deleted automatically by {es} at a later time.
|
|
|
|
To achieve that, it supports:
|
|
|
|
* Automatic <<index-rollover,rollover>>, which chunks your incoming data in smaller pieces to facilitate better performance
|
|
and backwards incompatible mapping changes.
|
|
* Configurable retention, which allows you to configure the time period for which your data is guaranteed to be stored.
|
|
{es} is allowed at a later time to delete data older than this time period. Retention can be configured on the data stream level
|
|
or on a global level. Read more about the different options in this <<tutorial-manage-data-stream-retention,tutorial>>.
|
|
|
|
A data stream lifecycle also supports downsampling the data stream backing indices.
|
|
See <<data-streams-put-lifecycle-downsampling-example, the downsampling example>> for
|
|
more details.
|
|
|
|
[discrete]
|
|
[[data-streams-lifecycle-how-it-works]]
|
|
=== How does it work?
|
|
|
|
In intervals configured by <<data-streams-lifecycle-poll-interval,`data_streams.lifecycle.poll_interval`>>, {es} goes over
|
|
each data stream and performs the following steps:
|
|
|
|
1. Checks if the data stream has a data stream lifecycle configured, skipping any indices not part of a managed data stream.
|
|
2. Rolls over the write index of the data stream, if it fulfills the conditions defined by
|
|
<<cluster-lifecycle-default-rollover,`cluster.lifecycle.default.rollover`>>.
|
|
3. After an index is not the write index anymore (i.e. the data stream has been rolled over),
|
|
automatically tail merges the index. Data stream lifecycle executes a merge operation that only targets
|
|
the long tail of small segments instead of the whole shard. As the segments are organised
|
|
into tiers of exponential sizes, merging the long tail of small segments is only a
|
|
fraction of the cost of force merging to a single segment. The small segments would usually
|
|
hold the most recent data so tail merging will focus the merging resources on the higher-value
|
|
data that is most likely to keep being queried.
|
|
4. If <<data-streams-put-lifecycle-downsampling-example, downsampling>> is configured it will execute
|
|
all the configured downsampling rounds.
|
|
5. Applies retention to the remaining backing indices. This means deleting the backing indices whose
|
|
`generation_time` is longer than the effective retention period (read more about the
|
|
<<effective-retention-calculation, effective retention calculation>>). The `generation_time` is only applicable to rolled
|
|
over backing indices and it is either the time since the backing index got rolled over, or the time optionally configured
|
|
in the <<index-data-stream-lifecycle-origination-date,`index.lifecycle.origination_date`>> setting.
|
|
|
|
IMPORTANT: We use the `generation_time` instead of the creation time because this ensures that all data in the backing
|
|
index have passed the retention period. As a result, the retention period is not the exact time data gets deleted, but
|
|
the minimum time data will be stored.
|
|
|
|
NOTE: Steps `2-4` apply only to backing indices that are not already managed by {ilm-init}, meaning that these indices either do
|
|
not have an {ilm-init} policy defined, or if they do, they have <<index-lifecycle-prefer-ilm,`index.lifecycle.prefer_ilm`>>
|
|
set to `false`.
|
|
|
|
[discrete]
|
|
[[data-stream-lifecycle-configuration]]
|
|
=== Configuring data stream lifecycle
|
|
|
|
Since the lifecycle is configured on the data stream level, the process to configure a lifecycle on a new data stream and
|
|
on an existing one differ.
|
|
|
|
In the following sections, we will go through the following tutorials:
|
|
|
|
* To create a new data stream with a lifecycle, you need to add the data stream lifecycle as part of the index template
|
|
that matches the name of your data stream (see <<tutorial-manage-new-data-stream>>). When a write operation
|
|
with the name of your data stream reaches {es} then the data stream will be created with the respective data stream lifecycle.
|
|
* To update the lifecycle of an existing data stream you need to use the <<data-stream-lifecycle-api, data stream lifecycle APIs>>
|
|
to edit the lifecycle on the data stream itself (see <<tutorial-manage-existing-data-stream>>).
|
|
* Migrate an existing {ilm-init} managed data stream to Data stream lifecycle using <<tutorial-migrate-data-stream-from-ilm-to-dsl>>.
|
|
|
|
NOTE: Updating the data stream lifecycle of an existing data stream is different from updating the settings or the mapping,
|
|
because it is applied on the data stream level and not on the individual backing indices.
|
|
|
|
include::tutorial-manage-new-data-stream.asciidoc[]
|
|
|
|
include::tutorial-manage-existing-data-stream.asciidoc[]
|
|
|
|
include::tutorial-manage-data-stream-retention.asciidoc[]
|
|
|
|
include::tutorial-migrate-data-stream-from-ilm-to-dsl.asciidoc[]
|