mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
123 lines
No EOL
4 KiB
Text
123 lines
No EOL
4 KiB
Text
[[high-cpu-usage]]
|
|
=== High CPU usage
|
|
|
|
{es} uses <<modules-threadpool,thread pools>> to manage CPU resources for
|
|
concurrent operations. High CPU usage typically means one or more thread pools
|
|
are running low.
|
|
|
|
If a thread pool is depleted, {es} will <<rejected-requests,reject requests>>
|
|
related to the thread pool. For example, if the `search` thread pool is
|
|
depleted, {es} will reject search requests until more threads are available.
|
|
|
|
You might experience high CPU usage if a <<data-tiers,data tier>>, and therefore the nodes assigned to that tier, is experiencing more traffic than other tiers. This imbalance in resource utilization is also known as <<hotspotting,hot spotting>>.
|
|
|
|
****
|
|
If you're using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to https://www.elastic.co/guide/en/cloud/current/ec-autoops.html[Monitor with AutoOps].
|
|
****
|
|
|
|
[discrete]
|
|
[[diagnose-high-cpu-usage]]
|
|
==== Diagnose high CPU usage
|
|
|
|
**Check CPU usage**
|
|
|
|
You can check the CPU usage per node using the <<cat-nodes,cat nodes API>>:
|
|
|
|
// tag::cpu-usage-cat-nodes[]
|
|
[source,console]
|
|
----
|
|
GET _cat/nodes?v=true&s=cpu:desc
|
|
----
|
|
|
|
The response's `cpu` column contains the current CPU usage as a percentage.
|
|
The `name` column contains the node's name. Elevated but transient CPU usage is
|
|
normal. However, if CPU usage is elevated for an extended duration, it should be
|
|
investigated.
|
|
|
|
To track CPU usage over time, we recommend enabling monitoring:
|
|
|
|
include::{es-ref-dir}/tab-widgets/cpu-usage-widget.asciidoc[]
|
|
|
|
**Check hot threads**
|
|
|
|
If a node has high CPU usage, use the <<cluster-nodes-hot-threads,nodes hot
|
|
threads API>> to check for resource-intensive threads running on the node.
|
|
|
|
[source,console]
|
|
----
|
|
GET _nodes/hot_threads
|
|
----
|
|
// TEST[s/\/my-node,my-other-node//]
|
|
|
|
This API returns a breakdown of any hot threads in plain text. High CPU usage
|
|
frequently correlates to <<task-queue-backlog,a long-running task, or a
|
|
backlog of tasks>>.
|
|
|
|
[discrete]
|
|
[[reduce-cpu-usage]]
|
|
==== Reduce CPU usage
|
|
|
|
The following tips outline the most common causes of high CPU usage and their
|
|
solutions.
|
|
|
|
**Scale your cluster**
|
|
|
|
Heavy indexing and search loads can deplete smaller thread pools. To better
|
|
handle heavy workloads, add more nodes to your cluster or upgrade your existing
|
|
nodes to increase capacity.
|
|
|
|
**Spread out bulk requests**
|
|
|
|
While more efficient than individual requests, large <<docs-bulk,bulk indexing>>
|
|
or <<search-multi-search,multi-search>> requests still require CPU resources. If
|
|
possible, submit smaller requests and allow more time between them.
|
|
|
|
**Cancel long-running searches**
|
|
|
|
Long-running searches can block threads in the `search` thread pool. To check
|
|
for these searches, use the <<tasks,task management API>>.
|
|
|
|
[source,console]
|
|
----
|
|
GET _tasks?actions=*search&detailed
|
|
----
|
|
|
|
The response's `description` contains the search request and its queries.
|
|
`running_time_in_nanos` shows how long the search has been running.
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"nodes" : {
|
|
"oTUltX4IQMOUUVeiohTt8A" : {
|
|
"name" : "my-node",
|
|
"transport_address" : "127.0.0.1:9300",
|
|
"host" : "127.0.0.1",
|
|
"ip" : "127.0.0.1:9300",
|
|
"tasks" : {
|
|
"oTUltX4IQMOUUVeiohTt8A:464" : {
|
|
"node" : "oTUltX4IQMOUUVeiohTt8A",
|
|
"id" : 464,
|
|
"type" : "transport",
|
|
"action" : "indices:data/read/search",
|
|
"description" : "indices[my-index], search_type[QUERY_THEN_FETCH], source[{\"query\":...}]",
|
|
"start_time_in_millis" : 4081771730000,
|
|
"running_time_in_nanos" : 13991383,
|
|
"cancellable" : true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTRESPONSE[skip: no way to get tasks]
|
|
|
|
To cancel a search and free up resources, use the API's `_cancel` endpoint.
|
|
|
|
[source,console]
|
|
----
|
|
POST _tasks/oTUltX4IQMOUUVeiohTt8A:464/_cancel
|
|
----
|
|
|
|
For additional tips on how to track and avoid resource-intensive searches, see
|
|
<<avoid-expensive-searches,Avoid expensive searches>>. |