mirror of
https://github.com/elastic/logstash.git
synced 2025-04-24 22:57:16 -04:00
(cherry picked from commit eff9b540df
)
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
1186 lines
39 KiB
Text
1186 lines
39 KiB
Text
[discrete]
|
|
[[monitoring]]
|
|
== APIs for monitoring {ls}
|
|
|
|
{ls} provides monitoring APIs for retrieving runtime metrics
|
|
about {ls}:
|
|
|
|
* <<node-info-api>>
|
|
* <<plugins-api>>
|
|
* <<node-stats-api>>
|
|
* <<hot-threads-api>>
|
|
|
|
|
|
You can use the root resource to retrieve general information about the Logstash instance, including
|
|
the host and version.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"host": "skywalker",
|
|
"version": "{logstash_version}",
|
|
"http_address": "127.0.0.1:9600"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: By default, the monitoring API attempts to bind to `tcp:9600`. If this port is already in use by another Logstash
|
|
instance, you need to launch Logstash with the `--api.http.port` flag specified to bind to a different port. See
|
|
<<command-line-flags>> for more information.
|
|
|
|
[discrete]
|
|
[[monitoring-api-security]]
|
|
==== Securing the Logstash API
|
|
|
|
The {ls} Monitoring APIs are not secured by default, but you can configure {ls} to secure them in one of several ways to meet your organization's needs.
|
|
|
|
You can enable SSL for the Logstash API by setting `api.ssl.enabled: true` in the `logstash.yml`, and providing the relevant keystore settings `api.ssl.keystore.path` and `api.ssl.keystore.password`:
|
|
|
|
[source]
|
|
--------------------------------------------------
|
|
api.ssl.enabled: true
|
|
api.ssl.keystore.path: /path/to/keystore.jks
|
|
api.ssl.keystore.password: "s3cUr3p4$$w0rd"
|
|
--------------------------------------------------
|
|
|
|
The keystore must be in either jks or p12 format, and must contain both a certificate and a private key.
|
|
Connecting clients receive this certificate, allowing them to authenticate the Logstash endpoint.
|
|
|
|
You can also require HTTP Basic authentication by setting `api.auth.type: basic` in the `logstash.yml`, and providing the relevant credentials `api.auth.basic.username` and `api.auth.basic.password`:
|
|
|
|
[source]
|
|
--------------------------------------------------
|
|
api.auth.type: basic
|
|
api.auth.basic.username: "logstash"
|
|
api.auth.basic.password: "s3cUreP4$$w0rD"
|
|
--------------------------------------------------
|
|
|
|
NOTE: Usage of Keystore or Environment or variable replacements is encouraged for password-type fields to avoid storing them in plain text.
|
|
For example, specifying the value `"${HTTP_PASS}"` will resolve to the value stored in the <<keystore,secure keystore's>> `HTTP_PASS` variable if present or the same variable from the <<environment-variables,environment>>)
|
|
|
|
[discrete]
|
|
[[monitoring-common-options]]
|
|
==== Common options
|
|
|
|
The following options can be applied to all of the Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
===== Pretty results
|
|
|
|
When appending `?pretty=true` to any request made, the JSON returned
|
|
will be pretty formatted (use it for debugging only!).
|
|
|
|
[discrete]
|
|
===== Human-readable output
|
|
|
|
NOTE: For Logstash {logstash_version}, the `human` option is supported for the <<hot-threads-api>>
|
|
only. When you specify `human=true`, the results are returned in plain text instead of
|
|
JSON format. The default is false.
|
|
|
|
Statistics are returned in a format suitable for humans
|
|
(eg `"exists_time": "1h"` or `"size": "1kb"`) and for computers
|
|
(eg `"exists_time_in_millis": 3600000` or `"size_in_bytes": 1024`).
|
|
The human-readable values can be turned off by adding `?human=false`
|
|
to the query string. This makes sense when the stats results are
|
|
being consumed by a monitoring tool, rather than intended for human
|
|
consumption. The default for the `human` flag is
|
|
`false`.
|
|
|
|
|
|
[[node-info-api]]
|
|
=== Node Info API
|
|
|
|
The node info API retrieves information about the node.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/<types>'
|
|
--------------------------------------------------
|
|
|
|
Where `<types>` is optional and specifies the types of node info you want to return.
|
|
|
|
You can limit the info that's returned by combining any of the following types in a comma-separated list:
|
|
|
|
[horizontal]
|
|
<<node-pipeline-info,`pipelines`>>::
|
|
Gets pipeline-specific information and settings for each pipeline.
|
|
<<node-os-info,`os`>>::
|
|
Gets node-level info about the OS.
|
|
<<node-jvm-info,`jvm`>>::
|
|
Gets node-level JVM info, including info about threads.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
[[node-pipeline-info]]
|
|
===== Pipeline info
|
|
|
|
The following request returns a JSON document that shows pipeline info, such as the number of workers,
|
|
batch size, and batch delay:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/pipelines?pretty'
|
|
--------------------------------------------------
|
|
|
|
If you want to view additional information about a pipeline, such as stats for each configured input, filter,
|
|
or output stage, see the <<pipeline-stats>> section under the <<node-stats-api>>.
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"workers" : 1,
|
|
"batch_size" : 1,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
},
|
|
"test2" : {
|
|
"workers" : 8,
|
|
"batch_size" : 125,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can see the info for a specific pipeline by including the pipeline ID. In
|
|
the following example, the ID of the pipeline is `test`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/pipelines/test?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
----------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"workers" : 1,
|
|
"batch_size" : 1,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
}
|
|
}
|
|
}
|
|
----------
|
|
|
|
If you specify an invalid pipeline ID, the request returns a 404 Not Found error.
|
|
|
|
[discrete]
|
|
[[node-os-info]]
|
|
==== OS info
|
|
|
|
The following request returns a JSON document that shows the OS name, architecture, version, and
|
|
available processors:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/os?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"os": {
|
|
"name": "Mac OS X",
|
|
"arch": "x86_64",
|
|
"version": "10.12.4",
|
|
"available_processors": 8
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[node-jvm-info]]
|
|
==== JVM info
|
|
|
|
The following request returns a JSON document that shows node-level JVM stats, such as the JVM process id, version,
|
|
VM info, memory usage, and info about garbage collectors:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/jvm?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"jvm": {
|
|
"pid": 59616,
|
|
"version": "1.8.0_65",
|
|
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
|
|
"vm_version": "1.8.0_65",
|
|
"vm_vendor": "Oracle Corporation",
|
|
"start_time_in_millis": 1484251185878,
|
|
"mem": {
|
|
"heap_init_in_bytes": 268435456,
|
|
"heap_max_in_bytes": 1037959168,
|
|
"non_heap_init_in_bytes": 2555904,
|
|
"non_heap_max_in_bytes": 0
|
|
},
|
|
"gc_collectors": [
|
|
"ParNew",
|
|
"ConcurrentMarkSweep"
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
[[plugins-api]]
|
|
=== Plugins info API
|
|
|
|
The plugins info API gets information about all Logstash plugins that are currently installed.
|
|
This API basically returns the output of running the `bin/logstash-plugin list --verbose` command.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/plugins?pretty'
|
|
--------------------------------------------------
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
The output is a JSON document.
|
|
|
|
Example response:
|
|
|
|
["source","js",subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"total": 93,
|
|
"plugins": [
|
|
{
|
|
"name": "logstash-codec-cef",
|
|
"version": "4.1.2"
|
|
},
|
|
{
|
|
"name": "logstash-codec-collectd",
|
|
"version": "3.0.3"
|
|
},
|
|
{
|
|
"name": "logstash-codec-dots",
|
|
"version": "3.0.2"
|
|
},
|
|
{
|
|
"name": "logstash-codec-edn",
|
|
"version": "3.0.2"
|
|
},
|
|
.
|
|
.
|
|
.
|
|
]
|
|
--------------------------------------------------
|
|
|
|
|
|
[[node-stats-api]]
|
|
=== Node Stats API
|
|
|
|
The node stats API retrieves runtime stats about Logstash.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/<types>'
|
|
--------------------------------------------------
|
|
|
|
Where `<types>` is optional and specifies the types of stats you want to return.
|
|
|
|
By default, all stats are returned. You can limit the info that's returned by combining any of the following types in a comma-separated list:
|
|
|
|
[horizontal]
|
|
<<jvm-stats,`jvm`>>::
|
|
Gets JVM stats, including stats about threads, memory usage, garbage collectors,
|
|
and uptime.
|
|
<<process-stats,`process`>>::
|
|
Gets process stats, including stats about file descriptors, memory consumption, and CPU usage.
|
|
<<event-stats,`events`>>::
|
|
Gets event-related statistics for the Logstash instance (regardless of how many
|
|
pipelines were created and destroyed).
|
|
<<flow-stats,`flow`>>::
|
|
Gets flow-related statistics for the Logstash instance (regardless of how many
|
|
pipelines were created and destroyed).
|
|
<<pipeline-stats,`pipelines`>>::
|
|
Gets runtime stats about each Logstash pipeline.
|
|
<<reload-stats,`reloads`>>::
|
|
Gets runtime stats about config reload successes and failures.
|
|
<<os-stats,`os`>>::
|
|
Gets runtime stats about cgroups when Logstash is running in a container.
|
|
<<geoip-database-stats,`geoip_download_manager`>>::
|
|
Gets stats for databases used with the <<plugins-filters-geoip, Geoip filter plugin>>.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
[[jvm-stats]]
|
|
==== JVM stats
|
|
|
|
The following request returns a JSON document containing JVM stats:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/jvm?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"jvm" : {
|
|
"threads" : {
|
|
"count" : 49,
|
|
"peak_count" : 50
|
|
},
|
|
"mem" : {
|
|
"heap_used_percent" : 14,
|
|
"heap_committed_in_bytes" : 309866496,
|
|
"heap_max_in_bytes" : 1037959168,
|
|
"heap_used_in_bytes" : 151686096,
|
|
"non_heap_used_in_bytes" : 122486176,
|
|
"non_heap_committed_in_bytes" : 133222400,
|
|
"pools" : {
|
|
"survivor" : {
|
|
"peak_used_in_bytes" : 8912896,
|
|
"used_in_bytes" : 288776,
|
|
"peak_max_in_bytes" : 35782656,
|
|
"max_in_bytes" : 35782656,
|
|
"committed_in_bytes" : 8912896
|
|
},
|
|
"old" : {
|
|
"peak_used_in_bytes" : 148656848,
|
|
"used_in_bytes" : 148656848,
|
|
"peak_max_in_bytes" : 715849728,
|
|
"max_in_bytes" : 715849728,
|
|
"committed_in_bytes" : 229322752
|
|
},
|
|
"young" : {
|
|
"peak_used_in_bytes" : 71630848,
|
|
"used_in_bytes" : 2740472,
|
|
"peak_max_in_bytes" : 286326784,
|
|
"max_in_bytes" : 286326784,
|
|
"committed_in_bytes" : 71630848
|
|
}
|
|
}
|
|
},
|
|
"gc" : {
|
|
"collectors" : {
|
|
"old" : {
|
|
"collection_time_in_millis" : 607,
|
|
"collection_count" : 12
|
|
},
|
|
"young" : {
|
|
"collection_time_in_millis" : 4904,
|
|
"collection_count" : 1033
|
|
}
|
|
}
|
|
},
|
|
"uptime_in_millis" : 1809643
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[process-stats]]
|
|
==== Process stats
|
|
|
|
The following request returns a JSON document containing process stats:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/process?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"process" : {
|
|
"open_file_descriptors" : 184,
|
|
"peak_open_file_descriptors" : 185,
|
|
"max_file_descriptors" : 10240,
|
|
"mem" : {
|
|
"total_virtual_in_bytes" : 5486125056
|
|
},
|
|
"cpu" : {
|
|
"total_in_millis" : 657136,
|
|
"percent" : 2,
|
|
"load_average" : {
|
|
"1m" : 2.38134765625
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[event-stats]]
|
|
==== Event stats
|
|
|
|
The following request returns a JSON document containing event-related statistics
|
|
for the Logstash instance:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/events?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"events" : {
|
|
"in" : 293658,
|
|
"filtered" : 293658,
|
|
"out" : 293658,
|
|
"duration_in_millis" : 2324391,
|
|
"queue_push_duration_in_millis" : 343816
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[flow-stats]]
|
|
==== Flow stats
|
|
|
|
The following request returns a JSON document containing flow-rates
|
|
for the Logstash instance:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/flow?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current": 189.720,
|
|
"lifetime": 201.841
|
|
},
|
|
"filter_throughput" : {
|
|
"current": 187.810,
|
|
"lifetime": 201.799
|
|
},
|
|
"output_throughput" : {
|
|
"current": 191.087,
|
|
"lifetime": 201.761
|
|
},
|
|
"queue_backpressure" : {
|
|
"current": 0.277,
|
|
"lifetime": 0.031
|
|
},
|
|
"worker_concurrency" : {
|
|
"current": 1.973,
|
|
"lifetime": 1.721
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: When the rate for a given flow metric window is infinite, it is presented as a string (either `"Infinity"` or `"-Infinity"`).
|
|
This occurs when the numerator metric has changed during the window without a change in the rate's denominator metric.
|
|
|
|
Flow rates provide visibility into how a Logstash instance or an individual pipeline is _currently_ performing relative to _itself_ over time.
|
|
This allows us to attach _meaning_ to the cumulative-value metrics that are also presented by this API, and to determine whether an instance or pipeline is behaving better or worse than it has in the past.
|
|
|
|
The following flow rates are available for the logstash process as a whole and for each of its pipelines individually.
|
|
In addition, pipelines may have <<pipeline-flow-rates, additional flow rates>> depending on their configuration.
|
|
|
|
|
|
[%autowidth.stretch, cols="1m,4"]
|
|
|===
|
|
|Flow Rate | Definition
|
|
|
|
| input_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events being pushed into the pipeline(s) queue(s) relative to wall-clock time (`events.in` / second).
|
|
It includes events that are blocked by the queue and have not yet been accepted.
|
|
|
|
| filter_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events flowing through the filter phase of the pipeline(s) relative to wall-clock time (`events.filtered` / second).
|
|
|
|
| output_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events flowing through the output phase of the pipeline(s) relative to wall-clock time (`events.out` / second).
|
|
|
|
| worker_concurrency |
|
|
This is a unitless metric representing the cumulative time spent by all workers relative to wall-clock time (`duration_in_millis` / millisecond).
|
|
|
|
A _pipeline_ is considered "saturated" when its `worker_concurrency` flow metric approaches its available `pipeline.workers`, because it indicates that all of its available workers are being kept busy.
|
|
Tuning a saturated pipeline to have more workers can often work to increase that pipeline's throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.
|
|
|
|
A _process_ is also considered "saturated" when its top-level `worker_concurrency` flow metric approaches the _cumulative_ `pipeline.workers` across _all_ pipelines, and similarly can be addressed by tuning the <<pipeline-stats,individual pipelines>> that are saturated.
|
|
|
|
| queue_backpressure |
|
|
This is a unitless metric representing the cumulative time spent by all inputs blocked pushing events into their pipeline's queue, relative to wall-clock time (`queue_push_duration_in_millis` / millisecond).
|
|
It is typically most useful when looking at the stats for an <<pipeline-stats,individual pipeline>>.
|
|
|
|
While a "zero" value indicates no back-pressure to the queue, the magnitude of this metric is highly dependent on the _shape_ of the pipelines and their inputs.
|
|
It cannot be used to compare one pipeline to another or even one process to _itself_ if the quantity or shape of its pipelines changes.
|
|
A pipeline with only one single-threaded input may contribute up to 1.00, a pipeline whose inputs have hundreds of inbound connections may contribute much higher numbers to this combined value.
|
|
|
|
Additionally, some amount of back-pressure is both _normal_ and _expected_ for pipelines that are _pulling_ data, as this back-pressure allows them to slow down and pull data at a rate its downstream pipeline can tolerate.
|
|
|===
|
|
|
|
Each flow stat includes rates for one or more recent windows of time:
|
|
|
|
// Templates for short-hand notes in the table below
|
|
:flow-stable: pass:quotes[*Stable*]
|
|
:flow-preview: pass:quotes[_Technology Preview_]
|
|
|
|
[%autowidth.stretch, cols="1m,2,4"]
|
|
|===
|
|
| Flow Window | Availability | Definition
|
|
|
|
| current | {flow-stable} | the most recent ~10s
|
|
| lifetime | {flow-stable} | the lifetime of the relevant pipeline or process
|
|
| last_1_minute | {flow-preview} | the most recent ~1 minute
|
|
| last_5_minutes | {flow-preview} | the most recent ~5 minutes
|
|
| last_15_minutes | {flow-preview} | the most recent ~15 minutes
|
|
| last_1_hour | {flow-preview} | the most recent ~1 hour
|
|
| last_24_hours | {flow-preview} | the most recent ~24 hours
|
|
|
|
|===
|
|
|
|
NOTE: The flow rate windows marked as "Technology Preview" are subject to change without notice.
|
|
Future releases of {ls} may include more, fewer, or different windows for each rate in response to community feedback.
|
|
|
|
[discrete]
|
|
[[pipeline-stats]]
|
|
==== Pipeline stats
|
|
|
|
The following request returns a JSON document containing pipeline stats,
|
|
including:
|
|
|
|
* the number of events that were input, filtered, or output by each pipeline
|
|
* the current and lifetime <<flow-stats,_flow_ rates>> for each pipeline
|
|
* stats for each configured filter or output stage
|
|
* info about config reload successes and failures
|
|
(when <<reloading-config,config reload>> is enabled)
|
|
* info about the persistent queue (when <<persistent-queues,persistent queues>> are enabled)
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"events" : {
|
|
"duration_in_millis" : 365495,
|
|
"in" : 216610,
|
|
"filtered" : 216485,
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 342466
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 603.1,
|
|
"lifetime" : 575.4
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 604.2,
|
|
"lifetime" : 575.1
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 604.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0.214,
|
|
"lifetime" : 0.937
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 0.941,
|
|
"lifetime" : 0.9709
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 93.092,
|
|
"lifetime" : 92.187
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
|
|
"events" : {
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 342466
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 603.1,
|
|
"lifetime" : 590.7
|
|
}
|
|
},
|
|
"name" : "beats"
|
|
} ],
|
|
"filters" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
|
|
"events" : {
|
|
"duration_in_millis" : 55969,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"failures" : 216485,
|
|
"patterns_per_field" : {
|
|
"message" : 1
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 16.71,
|
|
"lifetime" : 15.27
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 2829,
|
|
"lifetime" : 0.2585
|
|
}
|
|
},
|
|
"name" : "grok"
|
|
}, {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
|
|
"events" : {
|
|
"duration_in_millis" : 3326,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 1.042,
|
|
"lifetime" : 0.9076
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.01763,
|
|
"lifetime" : 0.01536
|
|
}
|
|
},
|
|
"name" : "geoip"
|
|
} ],
|
|
"outputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
|
|
"events" : {
|
|
"duration_in_millis" : 278557,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 75.34,
|
|
"lifetime" : 76.01
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 1.276,
|
|
"lifetime" : 1.287
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue" : {
|
|
"type" : "memory"
|
|
}
|
|
},
|
|
"test2" : {
|
|
"events" : {
|
|
"duration_in_millis" : 2222229,
|
|
"in" : 87247,
|
|
"filtered" : 87247,
|
|
"out" : 87247,
|
|
"queue_push_duration_in_millis" : 1532
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 301.7,
|
|
"lifetime" : 231.8
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 207.2,
|
|
"lifetime" : 231.8
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 207.2,
|
|
"lifetime" : 231.8
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0.735,
|
|
"lifetime" : 0.0006894
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 8.0,
|
|
"lifetime" : 5.903
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 100,
|
|
"lifetime" : 75.8
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-1",
|
|
"events" : {
|
|
"out" : 87247,
|
|
"queue_push_duration_in_millis" : 1532
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 301.7,
|
|
"lifetime" : 238.1
|
|
}
|
|
},
|
|
"name" : "twitter"
|
|
} ],
|
|
"filters" : [ ],
|
|
"outputs" : [ {
|
|
"id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-2",
|
|
"events" : {
|
|
"duration_in_millis" : 2222229,
|
|
"in" : 87247,
|
|
"out" : 87247
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 100,
|
|
"lifetime" : 75.8
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 33.6,
|
|
"lifetime" : 25.47
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue" : {
|
|
"type" : "memory"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can see the stats for a specific pipeline by including the pipeline ID. In
|
|
the following example, the ID of the pipeline is `test`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/pipelines/test?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"events" : {
|
|
"duration_in_millis" : 365495,
|
|
"in" : 216485,
|
|
"filtered" : 216485,
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 2283
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 871.3,
|
|
"lifetime" : 575.1
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 874.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 874.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0,
|
|
"lifetime" : 0.006246
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 1.471,
|
|
"lifetime" : 0.9709
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 74.54,
|
|
"lifetime" : 46.10
|
|
},
|
|
"queue_persisted_growth_bytes" : {
|
|
"current" : 8731,
|
|
"lifetime" : 0.0106
|
|
},
|
|
"queue_persisted_growth_events" : {
|
|
"current" : 0.0,
|
|
"lifetime" : 0.0
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
|
|
"events" : {
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 2283
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 871.3,
|
|
"lifetime" : 590.7
|
|
}
|
|
},
|
|
"name" : "beats"
|
|
} ],
|
|
"filters" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
|
|
"events" : {
|
|
"duration_in_millis" : 55969,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"failures" : 216485,
|
|
"patterns_per_field" : {
|
|
"message" : 1
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 10.53,
|
|
"lifetime" : 7.636
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.3565,
|
|
"lifetime" : 0.2585
|
|
}
|
|
},
|
|
"name" : "grok"
|
|
}, {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
|
|
"events" : {
|
|
"duration_in_millis" : 3326,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"name" : "geoip",
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 1.743,
|
|
"lifetime" : 0.4538
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.0590,
|
|
"lifetime" : 0.01536
|
|
}
|
|
}
|
|
} ],
|
|
"outputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
|
|
"events" : {
|
|
"duration_in_millis" : 278557,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 62.27,
|
|
"lifetime" : 38.01
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 2.109,
|
|
"lifetime" : 1.287
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue": {
|
|
"type" : "persisted",
|
|
"capacity": {
|
|
"max_unread_events": 0,
|
|
"page_capacity_in_bytes": 67108864,
|
|
"max_queue_size_in_bytes": 1073741824,
|
|
"queue_size_in_bytes": 3885
|
|
},
|
|
"data": {
|
|
"path": "/pipeline/queue/path",
|
|
"free_space_in_bytes": 936886480896,
|
|
"storage_type": "apfs"
|
|
},
|
|
"events": 0,
|
|
"events_count": 0,
|
|
"queue_size_in_bytes": 3885,
|
|
"max_queue_size_in_bytes": 1073741824
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[pipeline-flow-rates]]
|
|
===== Pipeline flow rates
|
|
|
|
Each pipeline's entry in the API response includes a number of pipeline-scoped <<flow-stats,_flow_ rates>> such as `input_throughput`, `worker_concurrency`, and `queue_backpressure` to provide visibility into the flow of events through the pipeline.
|
|
|
|
When configured with a <<persistent-queues,persistent queue>>, the pipeline's `flow` will include additional rates to provide visibility into the health of the pipeline's persistent queue:
|
|
|
|
[%autowidth.stretch, cols="1m,4"]
|
|
|===
|
|
|Flow Rate | Definition
|
|
|
|
| queue_persisted_growth_events |
|
|
This metric is expressed in events-per-second, and is the rate of change of the number of unacknowleged events in the queue, relative to wall-clock time (`queue.events_count` / second).
|
|
A positive number indicates that the queue's event-count is growing, and a negative number indicates that the queue is shrinking.
|
|
|
|
| queue_persisted_growth_bytes |
|
|
This metric is expressed in bytes-per-second, and is the rate of change of the size of the persistent queue on disk, relative to wall-clock time (`queue.queue_size_in_bytes` / second).
|
|
A positive number indicates that the queue size-on-disk is growing, and a negative number indicates that the queue is shrinking.
|
|
|
|
NOTE: The size of a PQ on disk includes both unacknowledged events and previously-acknowledged events from pages that contain one or more unprocessed events.
|
|
This means it grows gradually as individual events are added, but shrinks in large chunks each time a whole page of processed events is reclaimed (read more: <<garbage-collection, PQ disk garbage collection>>).
|
|
|
|
| worker_utilization |
|
|
This is a unitless metric that indicates the percentage of available worker time being used by this individual plugin (`duration` / (`uptime` * `pipeline.workers`).
|
|
It is useful for identifying which plugins in a pipeline are using the available worker resources.
|
|
|
|
A _pipeline_ is considered "saturated" when `worker_utilization` approaches 100, because it indicates that all of its workers are being kept busy.
|
|
This is typically an indication of either downstream back-pressure or insufficient resources allocated to the pipeline.
|
|
Tuning a saturated pipeline to have more workers can often work to increase that pipeline's throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.
|
|
|
|
A _pipeline_ is considered "starved" when `worker_utilization` approaches 0, because it indicates that none of its workers are being kept busy.
|
|
This is typically an indication that the inputs are not receiving or retrieving enough volume to keep the pipeline workers busy.
|
|
Tuning a starved pipeline to have fewer workers can help it to consume less memory and CPU, freeing up resources for other pipelines.
|
|
|===
|
|
|
|
[discrete]
|
|
[[plugin-flow-rates]]
|
|
===== Plugin flow rates
|
|
|
|
Several additional plugin-level flow rates are available, and can be helpful for identifying problems with individual plugins:
|
|
|
|
[%autowidth.stretch, cols="2m,1,4"]
|
|
|===
|
|
| Flow Rate | Plugin Types | Definition
|
|
|
|
| throughput | Inputs |
|
|
This metric is expressed in events-per-second, and is the rate of events this input plugin is pushing into the pipeline's queue relative to wall-clock time (`events.in` / `second`).
|
|
It includes events that are blocked by the queue and have not yet been accepted.
|
|
|
|
| worker_utilization | Filters, Outputs |
|
|
This is a unitless metric that indicates the percentage of available worker time being used by this individual plugin (`duration` / (`uptime` * `pipeline.workers`).
|
|
It is useful for identifying which plugins in a pipeline are using the available worker resources.
|
|
|
|
| worker_millis_per_event | Filters, Outputs |
|
|
This metric is expressed in worker-millis-spent-per-event (`duration_in_millis` / `events.in`) with higher scores indicating more resources spent per event.
|
|
It is especially useful for identifying issues with plugins that operate on a small subset of events.
|
|
An `"Infinity"` value for a given flow window indicates that worker millis have been spent without any events completing processing; this can indicate a plugin that is either stuck or handling only empty batches.
|
|
|
|
|===
|
|
[discrete]
|
|
[[reload-stats]]
|
|
==== Reload stats
|
|
|
|
The following request returns a JSON document that shows info about config reload successes and failures.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/reloads?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"reloads": {
|
|
"successes": 0,
|
|
"failures": 0
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[os-stats]]
|
|
==== OS stats
|
|
|
|
When Logstash is running in a container, the following request returns a JSON document that
|
|
contains cgroup information to give you a more accurate view of CPU load, including whether
|
|
the container is being throttled.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/os?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"os" : {
|
|
"cgroup" : {
|
|
"cpuacct" : {
|
|
"control_group" : "/elastic1",
|
|
"usage_nanos" : 378477588075
|
|
},
|
|
"cpu" : {
|
|
"control_group" : "/elastic1",
|
|
"cfs_period_micros" : 1000000,
|
|
"cfs_quota_micros" : 800000,
|
|
"stat" : {
|
|
"number_of_elapsed_periods" : 4157,
|
|
"number_of_times_throttled" : 460,
|
|
"time_throttled_nanos" : 581617440755
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[geoip-database-stats]]
|
|
==== Geoip database stats
|
|
|
|
You can monitor stats for the geoip databases used with the <<plugins-filters-geoip, Geoip filter plugin>>.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/geoip_download_manager?pretty'
|
|
--------------------------------------------------
|
|
|
|
For more info, see <<plugins-filters-geoip-metrics,Database Metrics>> in the Geoip filter plugin docs.
|
|
|
|
[[hot-threads-api]]
|
|
=== Hot Threads API
|
|
|
|
The hot threads API gets the current hot threads for Logstash. A hot thread is a
|
|
Java thread that has high CPU usage and executes for a longer than normal period
|
|
of time.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/hot_threads?pretty'
|
|
--------------------------------------------------
|
|
|
|
The output is a JSON document that contains a breakdown of the top hot threads for
|
|
Logstash.
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"hot_threads" : {
|
|
"time" : "2017-06-06T18:25:28-07:00",
|
|
"busiest_threads" : 3,
|
|
"threads" : [ {
|
|
"name" : "Ruby-0-Thread-7",
|
|
"percent_of_cpu_time" : 0.0,
|
|
"state" : "timed_waiting",
|
|
"path" : "/path/to/logstash-{logstash_version}/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:187",
|
|
"traces" : [ "java.lang.Object.wait(Native Method)", "org.jruby.RubyThread.sleep(RubyThread.java:1002)", "org.jruby.RubyKernel.sleep(RubyKernel.java:803)" ]
|
|
}, {
|
|
"name" : "[test2]>worker3",
|
|
"percent_of_cpu_time" : 0.85,
|
|
"state" : "waiting",
|
|
"traces" : [ "sun.misc.Unsafe.park(Native Method)", "java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)", "java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)" ]
|
|
}, {
|
|
"name" : "[test2]>worker2",
|
|
"percent_of_cpu_time" : 0.85,
|
|
"state" : "runnable",
|
|
"traces" : [ "org.jruby.RubyClass.allocate(RubyClass.java:225)", "org.jruby.RubyClass.newInstance(RubyClass.java:856)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)" ]
|
|
} ]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The parameters allowed are:
|
|
|
|
[horizontal]
|
|
`threads`:: The number of hot threads to return. The default is 10.
|
|
`stacktrace_size`:: The depth of the stack trace to report for each thread. The default is 50.
|
|
`human`:: If true, returns plain text instead of JSON format. The default is false.
|
|
`ignore_idle_threads`:: If true, does not return idle threads. The default is true.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
You can use the `?human` parameter to return the document in a human-readable format.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/hot_threads?human=true'
|
|
--------------------------------------------------
|
|
|
|
Example of a human-readable response:
|
|
|
|
[source%nowrap,text,subs="attributes"]
|
|
--------------------------------------------------
|
|
::: {}
|
|
Hot threads at 2017-06-06T18:31:17-07:00, busiestThreads=3:
|
|
================================================================================
|
|
0.0 % of cpu usage, state: timed_waiting, thread name: 'Ruby-0-Thread-7'
|
|
/path/to/logstash-{logstash_version}/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:187
|
|
java.lang.Object.wait(Native Method)
|
|
org.jruby.RubyThread.sleep(RubyThread.java:1002)
|
|
org.jruby.RubyKernel.sleep(RubyKernel.java:803)
|
|
--------------------------------------------------------------------------------
|
|
0.0 % of cpu usage, state: waiting, thread name: 'defaultEventExecutorGroup-5-4'
|
|
sun.misc.Unsafe.park(Native Method)
|
|
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
|
|
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
|
|
--------------------------------------------------------------------------------
|
|
0.05 % of cpu usage, state: timed_waiting, thread name: '[test]-pipeline-manager'
|
|
java.lang.Object.wait(Native Method)
|
|
java.lang.Thread.join(Thread.java:1253)
|
|
org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75)
|
|
|
|
--------------------------------------------------
|