mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
This guidance does not apply any longer. The overhead per shard has been significantly reduced in recent versions and removed rule of thumb will be too pessimistic in many if not most cases and might be too optimistic in other specific ones. => Replace guidance with rule of thumb per field count on data nodes and rule of thumb by index count (which is far more relevant nowadays than shards) for master nodes. relates #77466 Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
413 lines
15 KiB
Text
413 lines
15 KiB
Text
[[size-your-shards]]
|
||
== Size your shards
|
||
|
||
Each index in {es} is divided into one or more shards, each of which may be
|
||
replicated across multiple nodes to protect against hardware failures. If you
|
||
are using <<data-streams>> then each data stream is backed by a sequence of
|
||
indices. There is a limit to the amount of data you can store on a single node
|
||
so you can increase the capacity of your cluster by adding nodes and increasing
|
||
the number of indices and shards to match. However, each index and shard has
|
||
some overhead and if you divide your data across too many shards then the
|
||
overhead can become overwhelming. A cluster with too many indices or shards is
|
||
said to suffer from _oversharding_. An oversharded cluster will be less
|
||
efficient at responding to searches and in extreme cases it may even become
|
||
unstable.
|
||
|
||
[discrete]
|
||
[[create-a-sharding-strategy]]
|
||
=== Create a sharding strategy
|
||
|
||
The best way to prevent oversharding and other shard-related issues is to
|
||
create a sharding strategy. A sharding strategy helps you determine and
|
||
maintain the optimal number of shards for your cluster while limiting the size
|
||
of those shards.
|
||
|
||
Unfortunately, there is no one-size-fits-all sharding strategy. A strategy that
|
||
works in one environment may not scale in another. A good sharding strategy
|
||
must account for your infrastructure, use case, and performance expectations.
|
||
|
||
The best way to create a sharding strategy is to benchmark your production data
|
||
on production hardware using the same queries and indexing loads you'd see in
|
||
production. For our recommended methodology, watch the
|
||
https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[quantitative
|
||
cluster sizing video]. As you test different shard configurations, use {kib}'s
|
||
{kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
|
||
cluster's stability and performance.
|
||
|
||
The following sections provide some reminders and guidelines you should
|
||
consider when designing your sharding strategy. If your cluster is already
|
||
oversharded, see <<reduce-cluster-shard-count>>.
|
||
|
||
[discrete]
|
||
[[shard-sizing-considerations]]
|
||
=== Sizing considerations
|
||
|
||
Keep the following things in mind when building your sharding strategy.
|
||
|
||
[discrete]
|
||
[[single-thread-per-shard]]
|
||
==== Searches run on a single thread per shard
|
||
|
||
Most searches hit multiple shards. Each shard runs the search on a single
|
||
CPU thread. While a shard can run multiple concurrent searches, searches across a
|
||
large number of shards can deplete a node's <<modules-threadpool,search
|
||
thread pool>>. This can result in low throughput and slow search speeds.
|
||
|
||
[discrete]
|
||
[[each-shard-has-overhead]]
|
||
==== Each index, shard and field has overhead
|
||
|
||
Every index and every shard requires some memory and CPU resources. In most
|
||
cases, a small set of large shards uses fewer resources than many small shards.
|
||
|
||
Segments play a big role in a shard's resource usage. Most shards contain
|
||
several segments, which store its index data. {es} keeps segment metadata in
|
||
JVM heap memory so it can be quickly retrieved for searches. As a shard grows,
|
||
its segments are <<index-modules-merge,merged>> into fewer, larger segments.
|
||
This decreases the number of segments, which means less metadata is kept in
|
||
heap memory.
|
||
|
||
Every mapped field also carries some overhead in terms of memory usage and disk
|
||
space. By default {es} will automatically create a mapping for every field in
|
||
every document it indexes, but you can switch off this behaviour to
|
||
<<explicit-mapping,take control of your mappings>>.
|
||
|
||
[discrete]
|
||
[[shard-auto-balance]]
|
||
==== {es} automatically balances shards within a data tier
|
||
|
||
A cluster's nodes are grouped into <<data-tiers,data tiers>>. Within each tier,
|
||
{es} attempts to spread an index's shards across as many nodes as possible. When
|
||
you add a new node or a node fails, {es} automatically rebalances the index's
|
||
shards across the tier's remaining nodes.
|
||
|
||
[discrete]
|
||
[[shard-size-best-practices]]
|
||
=== Best practices
|
||
|
||
Where applicable, use the following best practices as starting points for your
|
||
sharding strategy.
|
||
|
||
[discrete]
|
||
[[delete-indices-not-documents]]
|
||
==== Delete indices, not documents
|
||
|
||
Deleted documents aren't immediately removed from {es}'s file system.
|
||
Instead, {es} marks the document as deleted on each related shard. The marked
|
||
document will continue to use resources until it's removed during a periodic
|
||
<<index-modules-merge,segment merge>>.
|
||
|
||
When possible, delete entire indices instead. {es} can immediately remove
|
||
deleted indices directly from the file system and free up resources.
|
||
|
||
[discrete]
|
||
[[use-ds-ilm-for-time-series]]
|
||
==== Use data streams and {ilm-init} for time series data
|
||
|
||
<<data-streams,Data streams>> let you store time series data across multiple,
|
||
time-based backing indices. You can use <<index-lifecycle-management,{ilm}
|
||
({ilm-init})>> to automatically manage these backing indices.
|
||
|
||
One advantage of this setup is
|
||
<<getting-started-index-lifecycle-management,automatic rollover>>, which creates
|
||
a new write index when the current one meets a defined `max_primary_shard_size`,
|
||
`max_age`, `max_docs`, or `max_size` threshold. When an index is no longer
|
||
needed, you can use {ilm-init} to automatically delete it and free up resources.
|
||
|
||
{ilm-init} also makes it easy to change your sharding strategy over time:
|
||
|
||
* *Want to decrease the shard count for new indices?* +
|
||
Change the <<index-number-of-shards,`index.number_of_shards`>> setting in the
|
||
data stream's <<data-streams-change-mappings-and-settings,matching index
|
||
template>>.
|
||
|
||
* *Want larger shards or fewer backing indices?* +
|
||
Increase your {ilm-init} policy's <<ilm-rollover,rollover threshold>>.
|
||
|
||
* *Need indices that span shorter intervals?* +
|
||
Offset the increased shard count by deleting older indices sooner. You can do
|
||
this by lowering the `min_age` threshold for your policy's
|
||
<<ilm-index-lifecycle,delete phase>>.
|
||
|
||
Every new backing index is an opportunity to further tune your strategy.
|
||
|
||
[discrete]
|
||
[[shard-size-recommendation]]
|
||
==== Aim for shard sizes between 10GB and 50GB
|
||
|
||
Larger shards take longer to recover after a failure. When a node fails, {es}
|
||
rebalances the node's shards across the data tier's remaining nodes. This
|
||
recovery process typically involves copying the shard contents across the
|
||
network, so a 100GB shard will take twice as long to recover than a 50GB shard.
|
||
In contrast, small shards carry proportionally more overhead and are less
|
||
efficient to search. Searching fifty 1GB shards will take substantially more
|
||
resources than searching a single 50GB shard containing the same data.
|
||
|
||
There are no hard limits on shard size, but experience shows that shards
|
||
between 10GB and 50GB typically work well for logs and time series data. You
|
||
may be able to use larger shards depending on your network and use case.
|
||
Smaller shards may be appropriate for
|
||
{enterprise-search-ref}/index.html[Enterprise Search] and similar use cases.
|
||
|
||
If you use {ilm-init}, set the <<ilm-rollover,rollover action>>'s
|
||
`max_primary_shard_size` threshold to `50gb` to avoid shards larger than 50GB.
|
||
|
||
To see the current size of your shards, use the <<cat-shards,cat shards API>>.
|
||
|
||
[source,console]
|
||
----
|
||
GET _cat/shards?v=true&h=index,prirep,shard,store&s=prirep,store&bytes=gb
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
The `pri.store.size` value shows the combined size of all primary shards for
|
||
the index.
|
||
|
||
[source,txt]
|
||
----
|
||
index prirep shard store
|
||
.ds-my-data-stream-2099.05.06-000001 p 0 50gb
|
||
...
|
||
----
|
||
// TESTRESPONSE[non_json]
|
||
// TESTRESPONSE[s/\.ds-my-data-stream-2099\.05\.06-000001/my-index-000001/]
|
||
// TESTRESPONSE[s/50gb/.*/]
|
||
|
||
[discrete]
|
||
[[shard-count-recommendation]]
|
||
==== Aim for 3000 indices or fewer per GB of heap memory on each master node
|
||
|
||
The number of indices a master node can manage is proportional to its heap
|
||
size. The exact amount of heap memory needed for each index depends on various
|
||
factors such as the size of the mapping and the number of shards per index.
|
||
|
||
As a general rule of thumb, you should aim for 3000 indices or fewer per GB of
|
||
heap on master nodes. For example, if your cluster contains 12000 indices then
|
||
each dedicated master node should have at least 4GB of heap. For non-dedicated
|
||
master nodes, the same rule holds and should be added to the heap requirements
|
||
of the other roles of each node.
|
||
|
||
To check the configured size of each node's heap, use the <<cat-nodes,cat nodes
|
||
API>>.
|
||
|
||
[source,console]
|
||
----
|
||
GET _cat/nodes?v=true&h=heap.max
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
You can use the <<cat-shards,cat shards API>> to check the number of shards per
|
||
node.
|
||
|
||
[source,console]
|
||
----
|
||
GET _cat/shards?v=true
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
[discrete]
|
||
[[field-count-recommendation]]
|
||
==== Allow 1kB of heap per field per index on data nodes, plus overheads
|
||
|
||
The exact resource usage of each mapped field depends on its type, but a rule
|
||
of thumb is to allow for approximately 1kB of heap overhead per mapped field
|
||
per index held by each data node. You must also allow enough heap for {es}'s
|
||
baseline usage as well as your workload such as indexing, searches and
|
||
aggregations. 0.5GB of extra heap will suffice for many reasonable workloads,
|
||
and you may need even less if your workload is very light while heavy workloads
|
||
may require more.
|
||
|
||
For example, if a data node holds shards from 1000 indices, each containing
|
||
4000 mapped fields, then you should allow approximately 1000 × 4000 × 1kB = 4GB
|
||
of heap for the fields and another 0.5GB of heap for its workload and other
|
||
overheads, and therefore this node will need a heap size of at least 4.5GB.
|
||
|
||
[discrete]
|
||
[[avoid-node-hotspots]]
|
||
==== Avoid node hotspots
|
||
|
||
If too many shards are allocated to a specific node, the node can become a
|
||
hotspot. For example, if a single node contains too many shards for an index
|
||
with a high indexing volume, the node is likely to have issues.
|
||
|
||
To prevent hotspots, use the
|
||
<<total-shards-per-node,`index.routing.allocation.total_shards_per_node`>> index
|
||
setting to explicitly limit the number of shards on a single node. You can
|
||
configure `index.routing.allocation.total_shards_per_node` using the
|
||
<<indices-update-settings,update index settings API>>.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT my-index-000001/_settings
|
||
{
|
||
"index" : {
|
||
"routing.allocation.total_shards_per_node" : 5
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[setup:my_index]
|
||
|
||
[discrete]
|
||
[[avoid-unnecessary-fields]]
|
||
==== Avoid unnecessary mapped fields
|
||
|
||
By default {es} <<dynamic-mapping,automatically creates a mapping>> for every
|
||
field in every document it indexes. Every mapped field corresponds to some data
|
||
structures on disk which are needed for efficient search, retrieval, and
|
||
aggregations on this field. Details about each mapped field are also held in
|
||
memory. In many cases this overhead is unnecessary because a field is not used
|
||
in any searches or aggregations. Use <<explicit-mapping>> instead of dynamic
|
||
mapping to avoid creating fields that are never used. If a collection of fields
|
||
are typically used together, consider using <<copy-to>> to consolidate them at
|
||
index time. If a field is only rarely used, it may be better to make it a
|
||
<<runtime,Runtime field>> instead.
|
||
|
||
You can get information about which fields are being used with the
|
||
<<field-usage-stats>> API, and you can analyze the disk usage of mapped fields
|
||
using the <<indices-disk-usage>> API. Note however that unnecessary mapped
|
||
fields also carry some memory overhead as well as their disk usage.
|
||
|
||
[discrete]
|
||
[[reduce-cluster-shard-count]]
|
||
=== Reduce a cluster's shard count
|
||
|
||
If your cluster is already oversharded, you can use one or more of the following
|
||
methods to reduce its shard count.
|
||
|
||
[discrete]
|
||
[[create-indices-that-cover-longer-time-periods]]
|
||
==== Create indices that cover longer time periods
|
||
|
||
If you use {ilm-init} and your retention policy allows it, avoid using a
|
||
`max_age` threshold for the rollover action. Instead, use
|
||
`max_primary_shard_size` to avoid creating empty indices or many small shards.
|
||
|
||
If your retention policy requires a `max_age` threshold, increase it to create
|
||
indices that cover longer time intervals. For example, instead of creating daily
|
||
indices, you can create indices on a weekly or monthly basis.
|
||
|
||
[discrete]
|
||
[[delete-empty-indices]]
|
||
==== Delete empty or unneeded indices
|
||
|
||
If you're using {ilm-init} and roll over indices based on a `max_age` threshold,
|
||
you can inadvertently create indices with no documents. These empty indices
|
||
provide no benefit but still consume resources.
|
||
|
||
You can find these empty indices using the <<cat-count,cat count API>>.
|
||
|
||
[source,console]
|
||
----
|
||
GET _cat/count/my-index-000001?v=true
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
Once you have a list of empty indices, you can delete them using the
|
||
<<indices-delete-index,delete index API>>. You can also delete any other
|
||
unneeded indices.
|
||
|
||
[source,console]
|
||
----
|
||
DELETE my-index-000001
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
[discrete]
|
||
[[force-merge-during-off-peak-hours]]
|
||
==== Force merge during off-peak hours
|
||
|
||
If you no longer write to an index, you can use the <<indices-forcemerge,force
|
||
merge API>> to <<index-modules-merge,merge>> smaller segments into larger ones.
|
||
This can reduce shard overhead and improve search speeds. However, force merges
|
||
are resource-intensive. If possible, run the force merge during off-peak hours.
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index-000001/_forcemerge
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
[discrete]
|
||
[[shrink-existing-index-to-fewer-shards]]
|
||
==== Shrink an existing index to fewer shards
|
||
|
||
If you no longer write to an index, you can use the
|
||
<<indices-shrink-index,shrink index API>> to reduce its shard count.
|
||
|
||
{ilm-init} also has a <<ilm-shrink,shrink action>> for indices in the
|
||
warm phase.
|
||
|
||
[discrete]
|
||
[[combine-smaller-indices]]
|
||
==== Combine smaller indices
|
||
|
||
You can also use the <<docs-reindex,reindex API>> to combine indices
|
||
with similar mappings into a single large index. For time series data, you could
|
||
reindex indices for short time periods into a new index covering a
|
||
longer period. For example, you could reindex daily indices from October with a
|
||
shared index pattern, such as `my-index-2099.10.11`, into a monthly
|
||
`my-index-2099.10` index. After the reindex, delete the smaller indices.
|
||
|
||
[source,console]
|
||
----
|
||
POST _reindex
|
||
{
|
||
"source": {
|
||
"index": "my-index-2099.10.*"
|
||
},
|
||
"dest": {
|
||
"index": "my-index-2099.10"
|
||
}
|
||
}
|
||
----
|
||
|
||
[discrete]
|
||
[[troubleshoot-shard-related-errors]]
|
||
=== Troubleshoot shard-related errors
|
||
|
||
Here’s how to resolve common shard-related errors.
|
||
|
||
[discrete]
|
||
==== this action would add [x] total shards, but this cluster currently has [y]/[z] maximum shards open;
|
||
|
||
The <<cluster-max-shards-per-node,`cluster.max_shards_per_node`>> cluster
|
||
setting limits the maximum number of open shards for a cluster. This error
|
||
indicates an action would exceed this limit.
|
||
|
||
If you're confident your changes won't destabilize the cluster, you can
|
||
temporarily increase the limit using the <<cluster-update-settings,cluster
|
||
update settings API>> and retry the action.
|
||
|
||
[source,console]
|
||
----
|
||
PUT _cluster/settings
|
||
{
|
||
"persistent" : {
|
||
"cluster.max_shards_per_node": 1200
|
||
}
|
||
}
|
||
----
|
||
|
||
This increase should only be temporary. As a long-term solution, we recommend
|
||
you add nodes to the oversharded data tier or
|
||
<<reduce-cluster-shard-count,reduce your cluster's shard count>>. To get a
|
||
cluster's current shard count after making changes, use the
|
||
<<cluster-stats,cluster stats API>>.
|
||
|
||
[source,console]
|
||
----
|
||
GET _cluster/stats?filter_path=indices.shards.total
|
||
----
|
||
|
||
When a long-term solution is in place, we recommend you reset the
|
||
`cluster.max_shards_per_node` limit.
|
||
|
||
[source,console]
|
||
----
|
||
PUT _cluster/settings
|
||
{
|
||
"persistent" : {
|
||
"cluster.max_shards_per_node": null
|
||
}
|
||
}
|
||
----
|