mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
* Revise to match new guidelines * Address review suggestions and comments * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review --------- Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
149 lines
4.8 KiB
Text
149 lines
4.8 KiB
Text
[[task-queue-backlog]]
|
|
=== Backlogged task queue
|
|
|
|
*******************************
|
|
*Product:* Elasticsearch +
|
|
*Deployment type:* Elastic Cloud Enterprise, Elastic Cloud Hosted, Elastic Cloud on Kubernetes, Elastic Self-Managed +
|
|
*Versions:* All
|
|
*******************************
|
|
|
|
A backlogged task queue can prevent tasks from completing and lead to an
|
|
unhealthy cluster state. Contributing factors include resource constraints,
|
|
a large number of tasks triggered at once, and long-running tasks.
|
|
|
|
[discrete]
|
|
[[diagnose-task-queue-backlog]]
|
|
==== Diagnose a backlogged task queue
|
|
|
|
To identify the cause of the backlog, try these diagnostic actions.
|
|
|
|
* <<diagnose-task-queue-thread-pool>>
|
|
* <<diagnose-task-queue-hot-thread>>
|
|
* <<diagnose-task-queue-long-running-node-tasks>>
|
|
* <<diagnose-task-queue-long-running-cluster-tasks>>
|
|
|
|
[discrete]
|
|
[[diagnose-task-queue-thread-pool]]
|
|
===== Check the thread pool status
|
|
|
|
A <<high-cpu-usage,depleted thread pool>> can result in
|
|
<<rejected-requests,rejected requests>>.
|
|
|
|
Use the <<cat-thread-pool,cat thread pool API>> to monitor
|
|
active threads, queued tasks, rejections, and completed tasks:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed
|
|
----
|
|
|
|
* Look for high `active` and `queue` metrics, which indicate potential bottlenecks
|
|
and opportunities to <<reduce-cpu-usage,reduce CPU usage>>.
|
|
* Determine whether thread pool issues are specific to a <<data-tiers,data tier>>.
|
|
* Check whether a specific node's thread pool is depleting faster than others. This
|
|
might indicate <<resolve-task-queue-backlog-hotspotting, hot spotting>>.
|
|
|
|
[discrete]
|
|
[[diagnose-task-queue-hot-thread]]
|
|
===== Inspect hot threads on each node
|
|
|
|
If a particular thread pool queue is backed up, periodically poll the
|
|
<<cluster-nodes-hot-threads,nodes hot threads API>> to gauge the thread's
|
|
progression and ensure it has sufficient resources:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_nodes/hot_threads
|
|
----
|
|
|
|
Although the hot threads API response does not list the specific tasks running on a thread,
|
|
it provides a summary of the thread's activities. You can correlate a hot threads response
|
|
with a <<tasks,task management API response>> to identify any overlap with specific tasks. For
|
|
example, if the hot threads response indicates the thread is `performing a search query`, you can
|
|
<<diagnose-task-queue-long-running-node-tasks,check for long-running search tasks>> using the task management API.
|
|
|
|
[discrete]
|
|
[[diagnose-task-queue-long-running-node-tasks]]
|
|
===== Identify long-running node tasks
|
|
|
|
Long-running tasks can also cause a backlog. Use the <<tasks,task
|
|
management API>> to check for excessive `running_time_in_nanos` values:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_tasks?pretty=true&human=true&detailed=true
|
|
----
|
|
|
|
You can filter on a specific `action`, such as <<docs-bulk,bulk indexing>> or search-related tasks.
|
|
These tend to be long-running.
|
|
|
|
* Filter on <<docs-bulk,bulk index>> actions:
|
|
+
|
|
[source,console]
|
|
----
|
|
GET /_tasks?human&detailed&actions=indices:data/write/bulk
|
|
----
|
|
|
|
* Filter on search actions:
|
|
+
|
|
[source,console]
|
|
----
|
|
GET /_tasks?human&detailed&actions=indices:data/write/search
|
|
----
|
|
|
|
Long-running tasks might need to be <<resolve-task-queue-backlog-stuck-tasks,canceled>>.
|
|
|
|
[discrete]
|
|
[[diagnose-task-queue-long-running-cluster-tasks]]
|
|
===== Look for long-running cluster tasks
|
|
|
|
Use the <<cluster-pending,cluster pending tasks API>> to identify delays
|
|
in cluster state synchronization:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_cluster/pending_tasks
|
|
----
|
|
|
|
Tasks with a high `timeInQueue` value are likely contributing to the backlog and might
|
|
need to be <<resolve-task-queue-backlog-stuck-tasks,canceled>>.
|
|
|
|
[discrete]
|
|
[[resolve-task-queue-backlog]]
|
|
==== Recommendations
|
|
|
|
After identifying problematic threads and tasks, resolve the issue by increasing resources or canceling tasks.
|
|
|
|
[discrete]
|
|
[[resolve-task-queue-backlog-resources]]
|
|
===== Increase available resources
|
|
|
|
If tasks are progressing slowly, try <<reduce-cpu-usage,reducing CPU usage>>.
|
|
|
|
In some cases, you might need to increase the thread pool size. For example, the `force_merge` thread pool defaults to a single thread.
|
|
Increasing the size to 2 might help reduce a backlog of force merge requests.
|
|
|
|
[discrete]
|
|
[[resolve-task-queue-backlog-stuck-tasks]]
|
|
===== Cancel stuck tasks
|
|
|
|
If an active task's <<diagnose-task-queue-hot-thread,hot thread>> shows no progress, consider <<task-cancellation,canceling the task>>.
|
|
|
|
[discrete]
|
|
[[resolve-task-queue-backlog-hotspotting]]
|
|
===== Address hot spotting
|
|
|
|
If a specific node's thread pool is depleting faster than others, try addressing
|
|
uneven node resource utilization, also known as hot spotting.
|
|
For details on actions you can take, such as rebalancing shards, see <<hotspotting>>.
|
|
|
|
[discrete]
|
|
==== Resources
|
|
|
|
Related symptoms:
|
|
|
|
* <<high-cpu-usage>>
|
|
* <<rejected-requests>>
|
|
* <<hotspotting>>
|
|
|
|
// TODO add link to standard Additional resources when that topic exists
|