mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
65 lines
3.4 KiB
Text
65 lines
3.4 KiB
Text
[[index-rollover]]
|
|
=== Rollover
|
|
|
|
When indexing time series data like logs or metrics, you can't write to a single index indefinitely.
|
|
To meet your indexing and search performance requirements and manage resource usage,
|
|
you write to an index until some threshold is met and
|
|
then create a new index and start writing to it instead.
|
|
Using rolling indices enables you to:
|
|
|
|
* Optimize the active index for high ingest rates on high-performance _hot_ nodes.
|
|
* Optimize for search performance on _warm_ nodes.
|
|
* Shift older, less frequently accessed data to less expensive _cold_ nodes,
|
|
* Delete data according to your retention policies by removing entire indices.
|
|
|
|
We recommend using <<indices-create-data-stream, data streams>> to manage time series
|
|
data. Data streams automatically track the write index while keeping configuration to a minimum.
|
|
|
|
Each data stream requires an <<index-templates,index template>> that contains:
|
|
|
|
* A name or wildcard (`*`) pattern for the data stream.
|
|
|
|
* The data stream's timestamp field. This field must be mapped as a
|
|
<<date,`date`>> or <<date_nanos,`date_nanos`>> field data type and must be
|
|
included in every document indexed to the data stream.
|
|
|
|
* The mappings and settings applied to each backing index when it's created.
|
|
|
|
Data streams are designed for append-only data, where the data stream name
|
|
can be used as the operations (read, write, rollover, shrink etc.) target.
|
|
If your use case requires data to be updated in place, you can instead manage
|
|
your time series data using <<aliases,index aliases>>. However, there are a few
|
|
more configuration steps and concepts:
|
|
|
|
* An _index template_ that specifies the settings for each new index in the series.
|
|
You optimize this configuration for ingestion, typically using as many shards as you have hot nodes.
|
|
* An _index alias_ that references the entire set of indices.
|
|
* A single index designated as the _write index_.
|
|
This is the active index that handles all write requests.
|
|
On each rollover, the new index becomes the write index.
|
|
|
|
[discrete]
|
|
[[ilm-automatic-rollover]]
|
|
=== Automatic rollover
|
|
|
|
{ilm-init} and the data stream lifecycle (in preview:[]]) enable you to automatically roll over to a new index based
|
|
on conditions like the index size, document count, or age. When a rollover is triggered, a new
|
|
index is created, the write alias is updated to point to the new index, and all
|
|
subsequent updates are written to the new index.
|
|
|
|
TIP: Rolling over to a new index based on size, document count, or age is preferable
|
|
to time-based rollovers. Rolling over at an arbitrary time often results in
|
|
many small indices, which can have a negative impact on performance and
|
|
resource usage.
|
|
|
|
IMPORTANT: Empty indices will not be rolled over, even if they have an associated `max_age` that
|
|
would otherwise result in a roll over occurring. A policy can override this behavior, and explicitly
|
|
opt in to rolling over empty indices, by adding a `"min_docs": 0` condition. This can also be
|
|
disabled on a cluster-wide basis by setting `indices.lifecycle.rollover.only_if_has_documents` to
|
|
`false`.
|
|
|
|
IMPORTANT: The rollover action implicitly always rolls over a data stream or alias if one or more shards contain
|
|
200000000 or more documents. Normally a shard will reach 50GB long before it reaches 200M documents,
|
|
but this isn't the case for space efficient data sets. Search performance will very likely suffer
|
|
if a shard contains more than 200M documents. This is the reason of the builtin limit.
|
|
|