elasticsearch/docs/reference/data-management.asciidoc

[role="xpack"]
[[data-management]]
= Data management

[partintro]
--
The data you store in {es} generally falls into one of two categories:

* Content: a collection of items you want to search, such as a catalog of products
* Time series data: a stream of continuously-generated timestamped data, such as log entries

Content might be frequently updated,
but the value of the content remains relatively constant over time.
You want to be able to retrieve items quickly regardless of how old they are.

Time series data keeps accumulating over time, so you need strategies for
balancing the value of the data against the cost of storing it.
As it ages, it tends to become less important and less-frequently accessed,
so you can move it to less expensive, less performant hardware.
For your oldest data, what matters is that you have access to the data.
It's ok if queries take longer to complete.

To help you manage your data, {es} offers you:

* <<index-lifecycle-management, {ilm-cap}>> ({ilm-init}) to manage both indices and data streams and it is fully customisable, and
* <<data-stream-lifecycle, Data stream lifecycle>> which is the built-in lifecycle of data streams and addresses the most
common lifecycle management needs.

preview::["The built-in data stream lifecycle is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]

**{ilm-init}** can be used to manage both indices and data streams and it allows you to:

* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
Data older than this period can be deleted by {es}.
* Define <<data-tiers, multiple tiers>> of data nodes with different performance characteristics.
* Automatically transition indices through the data tiers according to your performance needs and retention policies.
* Leverage <<searchable-snapshots, searchable snapshots>> stored in a remote repository to provide resiliency
for your older indices while reducing operating costs and maintaining search performance.
* Perform <<async-search-intro, asynchronous searches>> of data stored on less-performant hardware.

**Data stream lifecycle** is less feature rich but is focused on simplicity, so it allows you to easily:

* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
Data older than this period can be deleted by {es} at a later time.
* Improve the performance of your data stream by performing background operations that will optimise the way your data
stream is stored.
--

include::ilm/index.asciidoc[]

include::datatiers.asciidoc[]