elasticsearch/docs/reference/datatiers.asciidoc
Stef Nestor 4c5a3fb4da
[+Doc] Troubleshooting / Hot Spotting (#95429)
* [+Doc] Troubleshooting / Hot Spotting

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-04-26 12:29:47 -06:00

198 lines
8.3 KiB
Text

[role="xpack"]
[[data-tiers]]
== Data tiers
A _data tier_ is a collection of nodes with the same data role that
typically share the same hardware profile:
* <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
* <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
and hold your most recent, most-frequently-accessed data.
* <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
and rarely needs to be updated.
* <<cold-tier,Cold tier>> nodes hold time series data that is accessed
infrequently and not normally updated. To save space, you can keep
<<fully-mounted,fully mounted indices>> of
<<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. These fully mounted
indices eliminate the need for replicas, reducing required disk space by
approximately 50% compared to the regular indices.
* <<frozen-tier, Frozen tier>> nodes hold time series data that is accessed
rarely and never updated. The frozen tier stores <<partially-mounted,partially
mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
This extends the storage capacity even further — by up to 20 times compared to
the warm tier.
IMPORTANT: {es} generally expects nodes within a data tier to share the same
hardware profile. Variations not following this recommendation should be
carefully architected to avoid <<hotspotting,hot spotting>>.
When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
When you index documents to a data stream, they initially reside on hot tier nodes.
You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
to automatically transition your time series data through the hot, warm, and cold tiers
according to your performance, resiliency and data retention requirements.
[discrete]
[[content-tier]]
=== Content tier
// tag::content-tier[]
Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
Unlike time series data, the value of the content remains relatively constant over time,
so it doesn't make sense to move it to a tier with different performance characteristics as it ages.
Content data typically has long data retention requirements, and you want to be able to retrieve
items quickly regardless of how old they are.
Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput
so they can process complex searches and aggregations and return results quickly.
While they are also responsible for indexing, content data is generally not ingested at as high a rate
as time series data such as logs and metrics. From a resiliency perspective the indices in this
tier should be configured to use one or more replicas.
The content tier is required. System indices and other indices that aren't part
of a data stream are automatically allocated to the content tier.
// end::content-tier[]
[discrete]
[[hot-tier]]
=== Hot tier
// tag::hot-tier[]
The hot tier is the {es} entry point for time series data and holds your most-recent,
most-frequently-searched time series data.
Nodes in the hot tier need to be fast for both reads and writes,
which requires more hardware resources and faster storage (SSDs).
For resiliency, indices in the hot tier should be configured to use one or more replicas.
The hot tier is required. New indices that are part of a <<data-streams,
data stream>> are automatically allocated to the hot tier.
// end::hot-tier[]
[discrete]
[[warm-tier]]
=== Warm tier
// tag::warm-tier[]
Time series data can move to the warm tier once it is being queried less frequently
than the recently-indexed data in the hot tier.
The warm tier typically holds data from recent weeks.
Updates are still allowed, but likely infrequent.
Nodes in the warm tier generally don't need to be as fast as those in the hot tier.
For resiliency, indices in the warm tier should be configured to use one or more replicas.
// end::warm-tier[]
[discrete]
[[cold-tier]]
=== Cold tier
// tag::cold-tier[]
When you no longer need to search time series data regularly, it can move from
the warm tier to the cold tier. While still searchable, this tier is typically
optimized for lower storage costs rather than search speed.
For better storage savings, you can keep <<fully-mounted,fully mounted indices>>
of <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. Unlike regular
indices, these fully mounted indices don't require replicas for reliability. In
the event of a failure, they can recover data from the underlying snapshot
instead. This potentially halves the local storage needed for the data. A
snapshot repository is required to use fully mounted indices in the cold tier.
Fully mounted indices are read-only.
Alternatively, you can use the cold tier to store regular indices with replicas instead
of using {search-snaps}. This lets you store older data on less expensive hardware
but doesn't reduce required disk space compared to the warm tier.
// end::cold-tier[]
[discrete]
[[frozen-tier]]
=== Frozen tier
// tag::frozen-tier[]
Once data is no longer being queried, or being queried rarely, it may move from
the cold tier to the frozen tier where it stays for the rest of its life.
The frozen tier requires a snapshot repository.
The frozen tier uses <<partially-mounted,partially mounted indices>> to store
and load data from a snapshot repository. This reduces local storage and
operating costs while still letting you search frozen data. Because {es} must
sometimes fetch frozen data from the snapshot repository, searches on the frozen
tier are typically slower than on the cold tier.
// end::frozen-tier[]
[discrete]
[[configure-data-tiers-cloud]]
=== Configure data tiers on {ess} or {ece}
The default configuration for an {ecloud} deployment includes a shared tier for
hot and content data. This tier is required and can't be removed.
To add a warm, cold, or frozen tier when you create a deployment:
. On the **Create deployment** page, click **Advanced Settings**.
. Click **+ Add capacity** for any data tiers to add.
. Click **Create deployment** at the bottom of the page to save your changes.
[role="screenshot"]
image::images/data-tiers/ess-advanced-config-data-tiers.png[{ecloud}'s deployment Advanced configuration page,align=center]
To add a data tier to an existing deployment:
. Log in to the {ess-console}[{ecloud} console].
. On the **Deployments** page, select your deployment.
. In your deployment menu, select **Edit**.
. Click **+ Add capacity** for any data tiers to add.
. Click **Save** at the bottom of the page to save your changes.
To remove a data tier, refer to {cloud}/ec-disable-data-tier.html[Disable a data
tier].
[discrete]
[[configure-data-tiers-on-premise]]
=== Configure data tiers for self-managed deployments
For self-managed deployments, each node's <<data-node,data role>> is configured
in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
might be assigned to both the hot and content tiers:
[source,yaml]
----
node.roles: ["data_hot", "data_content"]
----
NOTE: We recommend you use <<data-frozen-node,dedicated nodes>> in the frozen
tier.
[discrete]
[[data-tier-allocation]]
=== Data tier index allocation
When you create an index, by default {es} sets
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_content` to automatically allocate the index shards to the content tier.
When {es} creates an index as part of a <<data-streams, data stream>>,
by default {es} sets
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_hot` to automatically allocate the index shards to the hot tier.
You can explicitly set `index.routing.allocation.include._tier_preference`
to opt out of the default tier-based allocation.
[discrete]
[[data-tier-migration]]
=== Automatic data tier migration
{ilm-init} automatically transitions managed
indices through the available data tiers using the <<ilm-migrate, migrate>> action.
By default, this action is automatically injected in every phase.
You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration,
for example, if you're using the <<ilm-allocate, allocate action>> to manually
specify allocation rules.