mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
286 lines
12 KiB
Text
286 lines
12 KiB
Text
[[hotspotting]]
|
|
=== Hot spotting
|
|
++++
|
|
<titleabbrev>Hot spotting</titleabbrev>
|
|
++++
|
|
:keywords: hot-spotting, hotspot, hot-spot, hot spot, hotspots, hotspotting
|
|
|
|
Computer link:{wikipedia}/Hot_spot_(computer_programming)[hot spotting]
|
|
may occur in {es} when resource utilizations are unevenly distributed across
|
|
<<modules-node,nodes>>. Temporary spikes are not usually considered problematic, but
|
|
ongoing significantly unique utilization may lead to cluster bottlenecks
|
|
and should be reviewed.
|
|
|
|
****
|
|
If you're using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to https://www.elastic.co/guide/en/cloud/current/ec-autoops.html[Monitor with AutoOps].
|
|
****
|
|
|
|
See link:https://www.youtube.com/watch?v=Q5ODJ5nIKAM[this video] for a walkthrough of troubleshooting a hot spotting issue.
|
|
|
|
[discrete]
|
|
[[detect]]
|
|
==== Detect hot spotting
|
|
|
|
Hot spotting most commonly surfaces as significantly elevated
|
|
resource utilization (of `disk.percent`, `heap.percent`, or `cpu`) among a
|
|
subset of nodes as reported via <<cat-nodes,cat nodes>>. Individual spikes aren't
|
|
necessarily problematic, but if utilization repeatedly spikes or consistently remains
|
|
high over time (for example longer than 30 seconds), the resource may be experiencing problematic
|
|
hot spotting.
|
|
|
|
For example, let's show case two separate plausible issues using cat nodes:
|
|
|
|
[source,console]
|
|
----
|
|
GET _cat/nodes?v&s=master,name&h=name,master,node.role,heap.percent,disk.used_percent,cpu
|
|
----
|
|
Pretend this same output pulled twice across five minutes:
|
|
|
|
[source,console-result]
|
|
----
|
|
name master node.role heap.percent disk.used_percent cpu
|
|
node_1 * hirstm 24 20 95
|
|
node_2 - hirstm 23 18 18
|
|
node_3 - hirstmv 25 90 10
|
|
----
|
|
// TEST[skip:illustrative response only]
|
|
|
|
Here we see two significantly unique utilizations: where the master node is at
|
|
`cpu: 95` and a hot node is at `disk.used_percent: 90%`. This would indicate
|
|
hot spotting was occurring on these two nodes, and not necessarily from the same
|
|
root cause.
|
|
|
|
[discrete]
|
|
[[causes]]
|
|
==== Causes
|
|
|
|
Historically, clusters experience hot spotting mainly as an effect of hardware,
|
|
shard distributions, and/or task load. We'll review these sequentially in order
|
|
of their potentially impacting scope.
|
|
|
|
[discrete]
|
|
[[causes-hardware]]
|
|
==== Hardware
|
|
|
|
Here are some common improper hardware setups which may contribute to hot
|
|
spotting:
|
|
|
|
* Resources are allocated non-uniformly. For example, if one hot node is
|
|
given half the CPU of its peers. {es} expects all nodes on a
|
|
<<data-tiers,data tier>> to share the same hardware profiles or
|
|
specifications.
|
|
|
|
* Resources are consumed by another service on the host, including other
|
|
{es} nodes. Refer to our <<dedicated-host,dedicated host>> recommendation.
|
|
|
|
* Resources experience different network or disk throughputs. For example, if one
|
|
node's I/O is lower than its peers. Refer to
|
|
<<tune-for-indexing-speed,Use faster hardware>> for more information.
|
|
|
|
* A JVM that has been configured with a heap larger than 31GB. Refer to <<set-jvm-heap-size>>
|
|
for more information.
|
|
|
|
* Problematic resources uniquely report <<setup-configuration-memory,memory swapping>>.
|
|
|
|
[discrete]
|
|
[[causes-shards]]
|
|
==== Shard distributions
|
|
|
|
{es} indices are divided into one or more link:{wikipedia}/Shard_(database_architecture)[shards]
|
|
which can sometimes be poorly distributed. {es} accounts for this by <<modules-cluster,balancing shard counts>>
|
|
across data nodes. As link:{blog-ref}whats-new-elasticsearch-kibana-cloud-8-6-0[introduced in version 8.6],
|
|
{es} by default also enables <<modules-cluster,desired balancing>> to account for ingest load.
|
|
A node may still experience hot spotting either due to write-heavy indices or by the
|
|
overall shards it's hosting.
|
|
|
|
[discrete]
|
|
[[causes-shards-nodes]]
|
|
===== Node level
|
|
|
|
You can check for shard balancing via <<cat-allocation,cat allocation>>, though as of version
|
|
8.6, <<modules-cluster,desired balancing>> may no longer fully expect to
|
|
balance shards. Kindly note, both methods may temporarily show problematic imbalance during
|
|
<<cluster-fault-detection,cluster stability issues>>.
|
|
|
|
For example, let's showcase two separate plausible issues using cat allocation:
|
|
|
|
[source,console]
|
|
----
|
|
GET _cat/allocation?v&s=node&h=node,shards,disk.percent,disk.indices,disk.used
|
|
----
|
|
|
|
Which could return:
|
|
|
|
[source,console-result]
|
|
----
|
|
node shards disk.percent disk.indices disk.used
|
|
node_1 446 19 154.8gb 173.1gb
|
|
node_2 31 52 44.6gb 372.7gb
|
|
node_3 445 43 271.5gb 289.4gb
|
|
----
|
|
// TEST[skip:illustrative response only]
|
|
|
|
Here we see two significantly unique situations. `node_2` has recently
|
|
restarted, so it has a much lower number of shards than all other nodes. This
|
|
also relates to `disk.indices` being much smaller than `disk.used` while shards
|
|
are recovering as seen via <<cat-recovery,cat recovery>>. While `node_2`'s shard
|
|
count is low, it may become a write hot spot due to ongoing <<ilm-rollover,ILM
|
|
rollovers>>. This is a common root cause of write hot spots covered in the next
|
|
section.
|
|
|
|
The second situation is that `node_3` has a higher `disk.percent` than `node_1`,
|
|
even though they hold roughly the same number of shards. This occurs when either
|
|
shards are not evenly sized (refer to <<shard-size-recommendation>>) or when
|
|
there are a lot of empty indices.
|
|
|
|
Cluster rebalancing based on desired balance does much of the heavy lifting
|
|
of keeping nodes from hot spotting. It can be limited by either nodes hitting
|
|
<<disk-based-shard-allocation,watermarks>> (refer to <<fix-watermark-errors,fixing disk watermark errors>>) or by a
|
|
write-heavy index's total shards being much lower than the written-to nodes.
|
|
|
|
You can confirm hot spotted nodes via <<cluster-nodes-stats,the nodes stats API>>,
|
|
potentially polling twice over time to only checking for the stats differences
|
|
between them rather than polling once giving you stats for the node's
|
|
full <<cluster-nodes-usage,node uptime>>. For example, to check all nodes
|
|
indexing stats:
|
|
|
|
[source,console]
|
|
----
|
|
GET _nodes/stats?human&filter_path=nodes.*.name,nodes.*.indices.indexing
|
|
----
|
|
|
|
[discrete]
|
|
[[causes-shards-index]]
|
|
===== Index level
|
|
|
|
Hot spotted nodes frequently surface via <<cat-thread-pool,cat thread pool>>'s
|
|
`write` and `search` queue backups. For example:
|
|
|
|
[source,console]
|
|
----
|
|
GET _cat/thread_pool/write,search?v=true&s=n,nn&h=n,nn,q,a,r,c
|
|
----
|
|
|
|
Which could return:
|
|
|
|
[source,console-result]
|
|
----
|
|
n nn q a r c
|
|
search node_1 3 1 0 1287
|
|
search node_2 0 2 0 1159
|
|
search node_3 0 1 0 1302
|
|
write node_1 100 3 0 4259
|
|
write node_2 0 4 0 980
|
|
write node_3 1 5 0 8714
|
|
----
|
|
// TEST[skip:illustrative response only]
|
|
|
|
Here you can see two significantly unique situations. Firstly, `node_1` has a
|
|
severely backed up write queue compared to other nodes. Secondly, `node_3` shows
|
|
historically completed writes that are double any other node. These are both
|
|
probably due to either poorly distributed write-heavy indices, or to multiple
|
|
write-heavy indices allocated to the same node. Since primary and replica writes
|
|
are majorly the same amount of cluster work, we usually recommend setting
|
|
<<total-shards-per-node,`index.routing.allocation.total_shards_per_node`>> to
|
|
force index spreading after lining up index shard counts to total nodes.
|
|
|
|
We normally recommend heavy-write indices have sufficient primary
|
|
`number_of_shards` and replica `number_of_replicas` to evenly spread across
|
|
indexing nodes. Alternatively, you can <<cluster-reroute,reroute>> shards to
|
|
more quiet nodes to alleviate the nodes with write hot spotting.
|
|
|
|
If it's non-obvious what indices are problematic, you can introspect further via
|
|
<<indices-stats,the index stats API>> by running:
|
|
|
|
[source,console]
|
|
----
|
|
GET _stats?level=shards&human&expand_wildcards=all&filter_path=indices.*.total.indexing.index_total
|
|
----
|
|
|
|
For more advanced analysis, you can poll for shard-level stats,
|
|
which lets you compare joint index-level and node-level stats. This analysis
|
|
wouldn't account for node restarts and/or shards rerouting, but serves as
|
|
overview:
|
|
|
|
[source,console]
|
|
----
|
|
GET _stats/indexing,search?level=shards&human&expand_wildcards=all
|
|
----
|
|
|
|
You can for example use the link:https://stedolan.github.io/jq[third-party JQ tool],
|
|
to process the output saved as `indices_stats.json`:
|
|
|
|
[source,sh]
|
|
----
|
|
cat indices_stats.json | jq -rc ['.indices|to_entries[]|.key as $i|.value.shards|to_entries[]|.key as $s|.value[]|{node:.routing.node[:4], index:$i, shard:$s, primary:.routing.primary, size:.store.size, total_indexing:.indexing.index_total, time_indexing:.indexing.index_time_in_millis, total_query:.search.query_total, time_query:.search.query_time_in_millis } | .+{ avg_indexing: (if .total_indexing>0 then (.time_indexing/.total_indexing|round) else 0 end), avg_search: (if .total_search>0 then (.time_search/.total_search|round) else 0 end) }'] > shard_stats.json
|
|
|
|
# show top written-to shard simplified stats which contain their index and node references
|
|
cat shard_stats.json | jq -rc 'sort_by(-.avg_indexing)[]' | head
|
|
----
|
|
|
|
[discrete]
|
|
[[causes-tasks]]
|
|
==== Task loads
|
|
|
|
Shard distribution problems will most-likely surface as task load as seen
|
|
above in the <<cat-thread-pool,cat thread pool>> example. It is also
|
|
possible for tasks to hot spot a node either due to
|
|
individual qualitative expensiveness or overall quantitative traffic loads.
|
|
|
|
For example, if <<cat-thread-pool,cat thread pool>> reported a high
|
|
queue on the `warmer` <<modules-threadpool,thread pool>>, you would
|
|
look-up the effected node's <<cluster-nodes-hot-threads,hot threads>>.
|
|
Let's say it reported `warmer` threads at `100% cpu` related to
|
|
`GlobalOrdinalsBuilder`. This would let you know to inspect
|
|
<<eager-global-ordinals,field data's global ordinals>>.
|
|
|
|
Alternatively, let's say <<cat-nodes,cat nodes>> shows a hot spotted master node
|
|
and <<cat-thread-pool,cat thread pool>> shows general queuing across nodes.
|
|
This would suggest the master node is overwhelmed. To resolve
|
|
this, first ensure <<high-availability-cluster-small-clusters,hardware high availability>>
|
|
setup and then look to ephemeral causes. In this example,
|
|
<<cluster-nodes-hot-threads,the nodes hot threads API>> reports multiple threads in
|
|
`other` which indicates they're waiting on or blocked by either garbage collection
|
|
or I/O.
|
|
|
|
For either of these example situations, a good way to confirm the problematic tasks
|
|
is to look at longest running non-continuous (designated `[c]`) tasks via
|
|
<<cat-tasks,cat task management>>. This can be supplemented checking longest
|
|
running cluster sync tasks via <<cat-pending-tasks,cat pending tasks>>. Using
|
|
a third example,
|
|
|
|
[source,console]
|
|
----
|
|
GET _cat/tasks?v&s=time:desc&h=type,action,running_time,node,cancellable
|
|
----
|
|
|
|
This could return:
|
|
|
|
[source,console-result]
|
|
----
|
|
type action running_time node cancellable
|
|
direct indices:data/read/eql 10m node_1 true
|
|
...
|
|
----
|
|
// TEST[skip:illustrative response only]
|
|
|
|
This surfaces a problematic <<eql-search-api,EQL query>>. We can gain
|
|
further insight on it via <<tasks,the task management API>>,
|
|
|
|
[source,console]
|
|
----
|
|
GET _tasks?human&detailed
|
|
----
|
|
|
|
Its response contains a `description` that reports this query:
|
|
|
|
[source,eql]
|
|
----
|
|
indices[winlogbeat-*,logs-window*], sequence by winlog.computer_name with maxspan=1m\n\n[authentication where host.os.type == "windows" and event.action:"logged-in" and\n event.outcome == "success" and process.name == "svchost.exe" ] by winlog.event_data.TargetLogonId
|
|
----
|
|
|
|
This lets you know which indices to check (`winlogbeat-*,logs-window*`), as well
|
|
as the <<eql-search-api,EQL search>> request body. Most likely this is
|
|
link:{security-guide}/es-overview.html[SIEM related].
|
|
You can combine this with <<enable-audit-logging,audit logging>> as needed to
|
|
trace the request source.
|