[[task-queue-backlog]] === Backlogged task queue ******************************* *Product:* Elasticsearch + *Deployment type:* Elastic Cloud Enterprise, Elastic Cloud Hosted, Elastic Cloud on Kubernetes, Elastic Self-Managed + *Versions:* All ******************************* A backlogged task queue can prevent tasks from completing and lead to an unhealthy cluster state. Contributing factors include resource constraints, a large number of tasks triggered at once, and long-running tasks. [discrete] [[diagnose-task-queue-backlog]] ==== Diagnose a backlogged task queue To identify the cause of the backlog, try these diagnostic actions. * <> * <> * <> * <> [discrete] [[diagnose-task-queue-thread-pool]] ===== Check the thread pool status A <> can result in <>. Use the <> to monitor active threads, queued tasks, rejections, and completed tasks: [source,console] ---- GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed ---- * Look for high `active` and `queue` metrics, which indicate potential bottlenecks and opportunities to <>. * Determine whether thread pool issues are specific to a <>. * Check whether a specific node's thread pool is depleting faster than others. This might indicate <>. [discrete] [[diagnose-task-queue-hot-thread]] ===== Inspect hot threads on each node If a particular thread pool queue is backed up, periodically poll the <> to gauge the thread's progression and ensure it has sufficient resources: [source,console] ---- GET /_nodes/hot_threads ---- Although the hot threads API response does not list the specific tasks running on a thread, it provides a summary of the thread's activities. You can correlate a hot threads response with a <> to identify any overlap with specific tasks. For example, if the hot threads response indicates the thread is `performing a search query`, you can <> using the task management API. [discrete] [[diagnose-task-queue-long-running-node-tasks]] ===== Identify long-running node tasks Long-running tasks can also cause a backlog. Use the <> to check for excessive `running_time_in_nanos` values: [source,console] ---- GET /_tasks?pretty=true&human=true&detailed=true ---- You can filter on a specific `action`, such as <> or search-related tasks. These tend to be long-running. * Filter on <> actions: + [source,console] ---- GET /_tasks?human&detailed&actions=indices:data/write/bulk ---- * Filter on search actions: + [source,console] ---- GET /_tasks?human&detailed&actions=indices:data/write/search ---- Long-running tasks might need to be <>. [discrete] [[diagnose-task-queue-long-running-cluster-tasks]] ===== Look for long-running cluster tasks Use the <> to identify delays in cluster state synchronization: [source,console] ---- GET /_cluster/pending_tasks ---- Tasks with a high `timeInQueue` value are likely contributing to the backlog and might need to be <>. [discrete] [[resolve-task-queue-backlog]] ==== Recommendations After identifying problematic threads and tasks, resolve the issue by increasing resources or canceling tasks. [discrete] [[resolve-task-queue-backlog-resources]] ===== Increase available resources If tasks are progressing slowly, try <>. In some cases, you might need to increase the thread pool size. For example, the `force_merge` thread pool defaults to a single thread. Increasing the size to 2 might help reduce a backlog of force merge requests. [discrete] [[resolve-task-queue-backlog-stuck-tasks]] ===== Cancel stuck tasks If an active task's <> shows no progress, consider <>. [discrete] [[resolve-task-queue-backlog-hotspotting]] ===== Address hot spotting If a specific node's thread pool is depleting faster than others, try addressing uneven node resource utilization, also known as hot spotting. For details on actions you can take, such as rebalancing shards, see <>. [discrete] ==== Resources Related symptoms: * <> * <> * <> // TODO add link to standard Additional resources when that topic exists