mirror of
https://github.com/elastic/logstash.git
synced 2025-06-28 17:53:28 -04:00
* [health] bootstrap HealthObserver from agent to API (#16141) * [health] bootstrap HealthObserver from agent to API * specs: mocked agent needs health observer * add license headers * Merge `main` into `feature/health-report-api` (#16397) * Add GH vault plugin bot to allowed list (#16301) * regenerate webserver test certificates (#16331) * correctly handle stack overflow errors during pipeline compilation (#16323) This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened. A couple of thoughts on the way this is implemented: * There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens. * The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause. Solves #16320 * Doc: Reposition worker-utilization in doc (#16335) * settings: add support for observing settings after post-process hooks (#16339) Because logging configuration occurs after loading the `logstash.yml` settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are effectively emitted to a null logger and lost. By re-emitting after the post-process hooks, we can ensure that they make their way to the deprecation log. This change adds support for any setting that responds to `Object#observe_post_process` to receive it after all post-processing hooks have been executed. Resolves: elastic/logstash#16332 * fix line used to determine ES is up (#16349) * add retries to snyk buildkite job (#16343) * Fix 8.13.1 release notes (#16363) make a note of the fix that went to 8.13.1: #16026 Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> * Update logstash_releases.json (#16347) * [Bugfix] Resolve the array and char (single | double quote) escaped values of ${ENV} (#16365) * Properly resolve the values from ENV vars if literal array string provided with ENV var. * Docker acceptance test for persisting keys and use actual values in docker container. * Review suggestion. Simplify the code by stripping whitespace before `gsub`, no need to check comma and split. Co-authored-by: João Duarte <jsvd@users.noreply.github.com> --------- Co-authored-by: João Duarte <jsvd@users.noreply.github.com> * Doc: Add SNMP integration to breaking changes (#16374) * deprecate java less-than 17 (#16370) * Exclude substitution refinement on pipelines.yml (#16375) * Exclude substitution refinement on pipelines.yml (applies on ENV vars and logstash.yml where env2yaml saves vars) * Safety integration test for pipeline config.string contains ENV . * Doc: Forwardport 8.15.0 release notes to main (#16388) * Removing 8.14 from ci/branches.json as we have 8.15. (#16390) --------- Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com> Co-authored-by: João Duarte <jsvd@users.noreply.github.com> Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * Squashed merge from 8.x * Failure injector plugin implementation. (#16466) * Test purpose only failure injector integration (filter and output) plugins implementation. Add unit tests and include license notes. * Fix the degrate method name typo. Co-authored-by: Andrea Selva <selva.andre@gmail.com> * Add explanation to the config params and rebuild plugin gem. --------- Co-authored-by: Andrea Selva <selva.andre@gmail.com> * Health report integration tests bootstrapper and initial tests implementation (#16467) * Health Report integration tests bootstrapper and initial slow start scenario implementation. * Apply suggestions from code review Renaming expectation check method name. Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com> * Changed to branch concept, YAML structure simplified as changed to Dict. * Apply suggestions from code review Reflect `help_url` to the integration test. --------- Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com> * health api: expose `GET /_health_report` with pipelines/*/status probe (#16398) Adds a `GET /_health_report` endpoint with per-pipeline status probes, and wires the resulting report status into the other API responses, replacing their hard-coded `green` with a meaningful status indication. --------- Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * docs: health report API, and diagnosis links (feature-targeted) (#16518) * docs: health report API, and diagnosis links * Remove plus-for-passthrough markers Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> --------- Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * merge 8.x into feature branch... (#16519) * Add GH vault plugin bot to allowed list (#16301) * regenerate webserver test certificates (#16331) * correctly handle stack overflow errors during pipeline compilation (#16323) This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened. A couple of thoughts on the way this is implemented: * There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens. * The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause. Solves #16320 * Doc: Reposition worker-utilization in doc (#16335) * settings: add support for observing settings after post-process hooks (#16339) Because logging configuration occurs after loading the `logstash.yml` settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are effectively emitted to a null logger and lost. By re-emitting after the post-process hooks, we can ensure that they make their way to the deprecation log. This change adds support for any setting that responds to `Object#observe_post_process` to receive it after all post-processing hooks have been executed. Resolves: elastic/logstash#16332 * fix line used to determine ES is up (#16349) * add retries to snyk buildkite job (#16343) * Fix 8.13.1 release notes (#16363) make a note of the fix that went to 8.13.1: #16026 Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> * Update logstash_releases.json (#16347) * [Bugfix] Resolve the array and char (single | double quote) escaped values of ${ENV} (#16365) * Properly resolve the values from ENV vars if literal array string provided with ENV var. * Docker acceptance test for persisting keys and use actual values in docker container. * Review suggestion. Simplify the code by stripping whitespace before `gsub`, no need to check comma and split. Co-authored-by: João Duarte <jsvd@users.noreply.github.com> --------- Co-authored-by: João Duarte <jsvd@users.noreply.github.com> * Doc: Add SNMP integration to breaking changes (#16374) * deprecate java less-than 17 (#16370) * Exclude substitution refinement on pipelines.yml (#16375) * Exclude substitution refinement on pipelines.yml (applies on ENV vars and logstash.yml where env2yaml saves vars) * Safety integration test for pipeline config.string contains ENV . * Doc: Forwardport 8.15.0 release notes to main (#16388) * Removing 8.14 from ci/branches.json as we have 8.15. (#16390) * Increase Jruby -Xmx to avoid OOM during zip task in DRA (#16408) Fix: #16406 * Generate Dataset code with meaningful fields names (#16386) This PR is intended to help Logstash developers or users that want to better understand the code that's autogenerated to model a pipeline, assigning more meaningful names to the Datasets subclasses' fields. Updates `FieldDefinition` to receive the name of the field from construction methods, so that it can be used during the code generation phase, instead of the existing incremental `field%n`. Updates `ClassFields` to propagate the explicit field name down to the `FieldDefinitions`. Update the `DatasetCompiler` that add fields to `ClassFields` to assign a proper name to generated Dataset's fields. * Implements safe evaluation of conditional expressions, logging the error without killing the pipeline (#16322) This PR protects the if statements against expression evaluation errors, cancel the event under processing and log it. This avoids to crash the pipeline which encounter a runtime error during event condition evaluation, permitting to debug the root cause reporting the offending event and removing from the current processing batch. Translates the `org.jruby.exceptions.TypeError`, `IllegalArgumentException`, `org.jruby.exceptions.ArgumentError` that could happen during `EventCodition` evaluation into a custom `ConditionalEvaluationError` which bubbles up on AST tree nodes. It's catched in the `SplitDataset` node. Updates the generation of the `SplitDataset `so that the execution of `filterEvents` method inside the compute body is try-catch guarded and defer the execution to an instance of `AbstractPipelineExt.ConditionalEvaluationListener` to handle such error. In this particular case the error management consist in just logging the offending Event. --------- Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> * Update logstash_releases.json (#16426) * Release notes for 8.15.1 (#16405) (#16427) * Update release notes for 8.15.1 * update release note --------- Co-authored-by: logstashmachine <43502315+logstashmachine@users.noreply.github.com> Co-authored-by: Kaise Cheng <kaise.cheng@elastic.co> (cherry picked from commit2fca7e39e8
) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fix ConditionalEvaluationError to do not include the event that errored in its serialiaxed form, because it's not expected that this class is ever serialized. (#16429) (#16430) Make inner field of ConditionalEvaluationError transient to be avoided during serialization. (cherry picked from commitbb7ecc203f
) Co-authored-by: Andrea Selva <selva.andre@gmail.com> * use gnu tar compatible minitar to generate tar artifact (#16432) (#16434) Using VERSION_QUALIFIER when building the tarball distribution will fail since Ruby's TarWriter implements the older POSIX88 version of tar and paths will be longer than 100 characters. For the long paths being used in Logstash's plugins, mainly due to nested folders from jar-dependencies, we need the tarball to follow either the 2001 ustar format or gnu tar, which is implemented by the minitar gem. (cherry picked from commit69f0fa54ca
) Co-authored-by: João Duarte <jsvd@users.noreply.github.com> * account for the 8.x in DRA publishing task (#16436) (#16440) the current DRA publishing task computes the branch from the version contained in the version.yml This is done by taking the major.minor and confirming that a branch exists with that name. However this pattern won't be applicable for 8.x, as that branch currently points to 8.16.0 and there is no 8.16 branch. This commit falls back to reading the buildkite injected BUILDKITE_BRANCH variable. (cherry picked from commit17dba9f829
) Co-authored-by: João Duarte <jsvd@users.noreply.github.com> * Fixes the issue where LS wipes out all quotes from docker env variables. (#16456) (#16459) * Fixes the issue where LS wipes out all quotes from docker env variables. This is an issue when running LS on docker with CONFIG_STRING, needs to keep quotes with env variable. * Add a docker acceptance integration test. (cherry picked from commit7c64c7394b
) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * Known issue for 8.15.1 related to env vars references (#16455) (#16469) (cherry picked from commitb54caf3fd8
) Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co> * bump .ruby_version to jruby-9.4.8.0 (#16477) (#16480) (cherry picked from commit51cca7320e
) Co-authored-by: João Duarte <jsvd@users.noreply.github.com> * Release notes for 8.15.2 (#16471) (#16478) Co-authored-by: andsel <selva.andre@gmail.com> Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> (cherry picked from commit01dc76f3b5
) * Change LogStash::Util::SubstitutionVariables#replace_placeholders refine argument to optional (#16485) (#16488) (cherry picked from commit8368c00367
) Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com> * Use jruby-9.4.8.0 in exhaustive CIs. (#16489) (#16491) (cherry picked from commitfd1de39005
) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * Don't use an older JRuby with oraclelinux-7 (#16499) (#16501) A recent PR (elastic/ci-agent-images/pull/932) modernized the VM images and removed JRuby 9.4.5.0 and some older versions. This ended up breaking exhaustive test on Oracle Linux 7 that hard coded JRuby 9.4.5.0. PR https://github.com/elastic/logstash/pull/16489 worked around the problem by pinning to the new JRuby, but actually we don't need the conditional anymore since the original issue https://github.com/jruby/jruby/issues/7579#issuecomment-1425885324 has been resolved and none of our releasable branches (apart from 7.17 which uses `9.2.20.1`) specify `9.3.x.y` in `/.ruby-version`. Therefore, this commit removes conditional setting of JRuby for OracleLinux 7 agents in exhaustive tests (and relies on whatever `/.ruby-version` defines). (cherry picked from commit07c01f8231
) Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com> * Improve pipeline bootstrap error logs (#16495) (#16504) This PR adds the cause errors details on the pipeline converge state error logs (cherry picked from commite84fb458ce
) Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com> * Logstash Health Report Tests Buildkite pipeline setup. (#16416) (#16511) (cherry picked from commit5195332bc6
) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * Make health report test runner script executable. (#16446) (#16512) (cherry picked from commit2ebf2658ff
) Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> * Backport PR #16423 to 8.x: DLQ-ing events that trigger an conditional evaluation error. (#16493) * DLQ-ing events that trigger an conditional evaluation error. (#16423) When a conditional evaluation encounter an error in the expression the event that triggered the issue is sent to pipeline's DLQ, if enabled for the executing pipeline. This PR engage with the work done in #16322, the `ConditionalEvaluationListener` that is receives notifications about if-statements evaluation failure, is improved to also send the event to DLQ (if enabled in the pipeline) and not just logging it. (cherry picked from commitb69d993d71
) * Fixed warning about non serializable field DeadLetterQueueWriter in serializable AbstractPipelineExt --------- Co-authored-by: Andrea Selva <selva.andre@gmail.com> * add deprecation log for `--event_api.tags.illegal` (#16507) (#16515) - move `--event_api.tags.illegal` from option to deprecated_option - add deprecation log when the flag is explicitly used relates: #16356 Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> (cherry picked from commita4eddb8a2a
) Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com> --------- Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com> Co-authored-by: João Duarte <jsvd@users.noreply.github.com> Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co> Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com> Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com> --------- Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com> Co-authored-by: João Duarte <jsvd@users.noreply.github.com> Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com> Co-authored-by: Andrea Selva <selva.andre@gmail.com> Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com> Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co> Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com> Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com> (cherry picked from commit7eb5185b4e
) Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
1338 lines
44 KiB
Text
1338 lines
44 KiB
Text
[discrete]
|
|
[[monitoring]]
|
|
== APIs for monitoring {ls}
|
|
|
|
{ls} provides monitoring APIs for retrieving runtime information about {ls}:
|
|
|
|
* <<node-info-api>>
|
|
* <<plugins-api>>
|
|
* <<node-stats-api>>
|
|
* <<hot-threads-api>>
|
|
* <<logstash-health-report-api>>
|
|
|
|
|
|
You can use the root resource to retrieve general information about the Logstash instance, including
|
|
the host and version.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"host": "skywalker",
|
|
"version": "{logstash_version}",
|
|
"http_address": "127.0.0.1:9600"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: By default, the monitoring API attempts to bind to `tcp:9600`. If this port is already in use by another Logstash
|
|
instance, you need to launch Logstash with the `--api.http.port` flag specified to bind to a different port. See
|
|
<<command-line-flags>> for more information.
|
|
|
|
[discrete]
|
|
[[monitoring-api-security]]
|
|
==== Securing the Logstash API
|
|
|
|
The {ls} Monitoring APIs are not secured by default, but you can configure {ls} to secure them in one of several ways to meet your organization's needs.
|
|
|
|
You can enable SSL for the Logstash API by setting `api.ssl.enabled: true` in the `logstash.yml`, and providing the relevant keystore settings `api.ssl.keystore.path` and `api.ssl.keystore.password`:
|
|
|
|
[source]
|
|
--------------------------------------------------
|
|
api.ssl.enabled: true
|
|
api.ssl.keystore.path: /path/to/keystore.jks
|
|
api.ssl.keystore.password: "s3cUr3p4$$w0rd"
|
|
--------------------------------------------------
|
|
|
|
The keystore must be in either jks or p12 format, and must contain both a certificate and a private key.
|
|
Connecting clients receive this certificate, allowing them to authenticate the Logstash endpoint.
|
|
|
|
You can also require HTTP Basic authentication by setting `api.auth.type: basic` in the `logstash.yml`, and providing the relevant credentials `api.auth.basic.username` and `api.auth.basic.password`:
|
|
|
|
[source]
|
|
--------------------------------------------------
|
|
api.auth.type: basic
|
|
api.auth.basic.username: "logstash"
|
|
api.auth.basic.password: "s3cUreP4$$w0rD"
|
|
--------------------------------------------------
|
|
|
|
NOTE: Usage of Keystore or Environment or variable replacements is encouraged for password-type fields to avoid storing them in plain text.
|
|
For example, specifying the value `"${HTTP_PASS}"` will resolve to the value stored in the <<keystore,secure keystore's>> `HTTP_PASS` variable if present or the same variable from the <<environment-variables,environment>>)
|
|
|
|
[discrete]
|
|
[[monitoring-common-options]]
|
|
==== Common options
|
|
|
|
The following options can be applied to all of the Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
===== Pretty results
|
|
|
|
When appending `?pretty=true` to any request made, the JSON returned
|
|
will be pretty formatted (use it for debugging only!).
|
|
|
|
[discrete]
|
|
===== Human-readable output
|
|
|
|
NOTE: For Logstash {logstash_version}, the `human` option is supported for the <<hot-threads-api>>
|
|
only. When you specify `human=true`, the results are returned in plain text instead of
|
|
JSON format. The default is false.
|
|
|
|
Statistics are returned in a format suitable for humans
|
|
(eg `"exists_time": "1h"` or `"size": "1kb"`) and for computers
|
|
(eg `"exists_time_in_millis": 3600000` or `"size_in_bytes": 1024`).
|
|
The human-readable values can be turned off by adding `?human=false`
|
|
to the query string. This makes sense when the stats results are
|
|
being consumed by a monitoring tool, rather than intended for human
|
|
consumption. The default for the `human` flag is
|
|
`false`.
|
|
|
|
|
|
[[node-info-api]]
|
|
=== Node Info API
|
|
|
|
The node info API retrieves information about the node.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/<types>'
|
|
--------------------------------------------------
|
|
|
|
Where `<types>` is optional and specifies the types of node info you want to return.
|
|
|
|
You can limit the info that's returned by combining any of the following types in a comma-separated list:
|
|
|
|
[horizontal]
|
|
<<node-pipeline-info,`pipelines`>>::
|
|
Gets pipeline-specific information and settings for each pipeline.
|
|
<<node-os-info,`os`>>::
|
|
Gets node-level info about the OS.
|
|
<<node-jvm-info,`jvm`>>::
|
|
Gets node-level JVM info, including info about threads.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
[[node-pipeline-info]]
|
|
===== Pipeline info
|
|
|
|
The following request returns a JSON document that shows pipeline info, such as the number of workers,
|
|
batch size, and batch delay:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/pipelines?pretty'
|
|
--------------------------------------------------
|
|
|
|
If you want to view additional information about a pipeline, such as stats for each configured input, filter,
|
|
or output stage, see the <<pipeline-stats>> section under the <<node-stats-api>>.
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"workers" : 1,
|
|
"batch_size" : 1,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
},
|
|
"test2" : {
|
|
"workers" : 8,
|
|
"batch_size" : 125,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can see the info for a specific pipeline by including the pipeline ID. In
|
|
the following example, the ID of the pipeline is `test`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/pipelines/test?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
----------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"workers" : 1,
|
|
"batch_size" : 1,
|
|
"batch_delay" : 5,
|
|
"config_reload_automatic" : false,
|
|
"config_reload_interval" : 3
|
|
}
|
|
}
|
|
}
|
|
----------
|
|
|
|
If you specify an invalid pipeline ID, the request returns a 404 Not Found error.
|
|
|
|
[discrete]
|
|
[[node-os-info]]
|
|
==== OS info
|
|
|
|
The following request returns a JSON document that shows the OS name, architecture, version, and
|
|
available processors:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/os?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"os": {
|
|
"name": "Mac OS X",
|
|
"arch": "x86_64",
|
|
"version": "10.12.4",
|
|
"available_processors": 8
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[node-jvm-info]]
|
|
==== JVM info
|
|
|
|
The following request returns a JSON document that shows node-level JVM stats, such as the JVM process id, version,
|
|
VM info, memory usage, and info about garbage collectors:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/jvm?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"jvm": {
|
|
"pid": 59616,
|
|
"version": "1.8.0_65",
|
|
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
|
|
"vm_version": "1.8.0_65",
|
|
"vm_vendor": "Oracle Corporation",
|
|
"start_time_in_millis": 1484251185878,
|
|
"mem": {
|
|
"heap_init_in_bytes": 268435456,
|
|
"heap_max_in_bytes": 1037959168,
|
|
"non_heap_init_in_bytes": 2555904,
|
|
"non_heap_max_in_bytes": 0
|
|
},
|
|
"gc_collectors": [
|
|
"ParNew",
|
|
"ConcurrentMarkSweep"
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
[[plugins-api]]
|
|
=== Plugins info API
|
|
|
|
The plugins info API gets information about all Logstash plugins that are currently installed.
|
|
This API basically returns the output of running the `bin/logstash-plugin list --verbose` command.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/plugins?pretty'
|
|
--------------------------------------------------
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
The output is a JSON document.
|
|
|
|
Example response:
|
|
|
|
["source","js",subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"total": 93,
|
|
"plugins": [
|
|
{
|
|
"name": "logstash-codec-cef",
|
|
"version": "4.1.2"
|
|
},
|
|
{
|
|
"name": "logstash-codec-collectd",
|
|
"version": "3.0.3"
|
|
},
|
|
{
|
|
"name": "logstash-codec-dots",
|
|
"version": "3.0.2"
|
|
},
|
|
{
|
|
"name": "logstash-codec-edn",
|
|
"version": "3.0.2"
|
|
},
|
|
.
|
|
.
|
|
.
|
|
]
|
|
--------------------------------------------------
|
|
|
|
|
|
[[node-stats-api]]
|
|
=== Node Stats API
|
|
|
|
The node stats API retrieves runtime stats about Logstash.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/<types>'
|
|
--------------------------------------------------
|
|
|
|
Where `<types>` is optional and specifies the types of stats you want to return.
|
|
|
|
By default, all stats are returned. You can limit the info that's returned by combining any of the following types in a comma-separated list:
|
|
|
|
[horizontal]
|
|
<<jvm-stats,`jvm`>>::
|
|
Gets JVM stats, including stats about threads, memory usage, garbage collectors,
|
|
and uptime.
|
|
<<process-stats,`process`>>::
|
|
Gets process stats, including stats about file descriptors, memory consumption, and CPU usage.
|
|
<<event-stats,`events`>>::
|
|
Gets event-related statistics for the Logstash instance (regardless of how many
|
|
pipelines were created and destroyed).
|
|
<<flow-stats,`flow`>>::
|
|
Gets flow-related statistics for the Logstash instance (regardless of how many
|
|
pipelines were created and destroyed).
|
|
<<pipeline-stats,`pipelines`>>::
|
|
Gets runtime stats about each Logstash pipeline.
|
|
<<reload-stats,`reloads`>>::
|
|
Gets runtime stats about config reload successes and failures.
|
|
<<os-stats,`os`>>::
|
|
Gets runtime stats about cgroups when Logstash is running in a container.
|
|
<<geoip-database-stats,`geoip_download_manager`>>::
|
|
Gets stats for databases used with the <<plugins-filters-geoip, Geoip filter plugin>>.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
[discrete]
|
|
[[jvm-stats]]
|
|
==== JVM stats
|
|
|
|
The following request returns a JSON document containing JVM stats:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/jvm?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"jvm" : {
|
|
"threads" : {
|
|
"count" : 49,
|
|
"peak_count" : 50
|
|
},
|
|
"mem" : {
|
|
"heap_used_percent" : 14,
|
|
"heap_committed_in_bytes" : 309866496,
|
|
"heap_max_in_bytes" : 1037959168,
|
|
"heap_used_in_bytes" : 151686096,
|
|
"non_heap_used_in_bytes" : 122486176,
|
|
"non_heap_committed_in_bytes" : 133222400,
|
|
"pools" : {
|
|
"survivor" : {
|
|
"peak_used_in_bytes" : 8912896,
|
|
"used_in_bytes" : 288776,
|
|
"peak_max_in_bytes" : 35782656,
|
|
"max_in_bytes" : 35782656,
|
|
"committed_in_bytes" : 8912896
|
|
},
|
|
"old" : {
|
|
"peak_used_in_bytes" : 148656848,
|
|
"used_in_bytes" : 148656848,
|
|
"peak_max_in_bytes" : 715849728,
|
|
"max_in_bytes" : 715849728,
|
|
"committed_in_bytes" : 229322752
|
|
},
|
|
"young" : {
|
|
"peak_used_in_bytes" : 71630848,
|
|
"used_in_bytes" : 2740472,
|
|
"peak_max_in_bytes" : 286326784,
|
|
"max_in_bytes" : 286326784,
|
|
"committed_in_bytes" : 71630848
|
|
}
|
|
}
|
|
},
|
|
"gc" : {
|
|
"collectors" : {
|
|
"old" : {
|
|
"collection_time_in_millis" : 607,
|
|
"collection_count" : 12
|
|
},
|
|
"young" : {
|
|
"collection_time_in_millis" : 4904,
|
|
"collection_count" : 1033
|
|
}
|
|
}
|
|
},
|
|
"uptime_in_millis" : 1809643
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[process-stats]]
|
|
==== Process stats
|
|
|
|
The following request returns a JSON document containing process stats:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/process?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"process" : {
|
|
"open_file_descriptors" : 184,
|
|
"peak_open_file_descriptors" : 185,
|
|
"max_file_descriptors" : 10240,
|
|
"mem" : {
|
|
"total_virtual_in_bytes" : 5486125056
|
|
},
|
|
"cpu" : {
|
|
"total_in_millis" : 657136,
|
|
"percent" : 2,
|
|
"load_average" : {
|
|
"1m" : 2.38134765625
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[event-stats]]
|
|
==== Event stats
|
|
|
|
The following request returns a JSON document containing event-related statistics
|
|
for the Logstash instance:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/events?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"events" : {
|
|
"in" : 293658,
|
|
"filtered" : 293658,
|
|
"out" : 293658,
|
|
"duration_in_millis" : 2324391,
|
|
"queue_push_duration_in_millis" : 343816
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[flow-stats]]
|
|
==== Flow stats
|
|
|
|
The following request returns a JSON document containing flow-rates
|
|
for the Logstash instance:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/flow?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current": 189.720,
|
|
"lifetime": 201.841
|
|
},
|
|
"filter_throughput" : {
|
|
"current": 187.810,
|
|
"lifetime": 201.799
|
|
},
|
|
"output_throughput" : {
|
|
"current": 191.087,
|
|
"lifetime": 201.761
|
|
},
|
|
"queue_backpressure" : {
|
|
"current": 0.277,
|
|
"lifetime": 0.031
|
|
},
|
|
"worker_concurrency" : {
|
|
"current": 1.973,
|
|
"lifetime": 1.721
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: When the rate for a given flow metric window is infinite, it is presented as a string (either `"Infinity"` or `"-Infinity"`).
|
|
This occurs when the numerator metric has changed during the window without a change in the rate's denominator metric.
|
|
|
|
Flow rates provide visibility into how a Logstash instance or an individual pipeline is _currently_ performing relative to _itself_ over time.
|
|
This allows us to attach _meaning_ to the cumulative-value metrics that are also presented by this API, and to determine whether an instance or pipeline is behaving better or worse than it has in the past.
|
|
|
|
The following flow rates are available for the logstash process as a whole and for each of its pipelines individually.
|
|
In addition, pipelines may have <<pipeline-flow-rates, additional flow rates>> depending on their configuration.
|
|
|
|
|
|
[%autowidth.stretch, cols="1m,4"]
|
|
|===
|
|
|Flow Rate | Definition
|
|
|
|
| input_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events being pushed into the pipeline(s) queue(s) relative to wall-clock time (`events.in` / second).
|
|
It includes events that are blocked by the queue and have not yet been accepted.
|
|
|
|
| filter_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events flowing through the filter phase of the pipeline(s) relative to wall-clock time (`events.filtered` / second).
|
|
|
|
| output_throughput |
|
|
This metric is expressed in events-per-second, and is the rate of events flowing through the output phase of the pipeline(s) relative to wall-clock time (`events.out` / second).
|
|
|
|
| worker_concurrency |
|
|
This is a unitless metric representing the cumulative time spent by all workers relative to wall-clock time (`duration_in_millis` / millisecond).
|
|
|
|
A _pipeline_ is considered "saturated" when its `worker_concurrency` flow metric approaches its available `pipeline.workers`, because it indicates that all of its available workers are being kept busy.
|
|
Tuning a saturated pipeline to have more workers can often work to increase that pipeline's throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.
|
|
|
|
A _process_ is also considered "saturated" when its top-level `worker_concurrency` flow metric approaches the _cumulative_ `pipeline.workers` across _all_ pipelines, and similarly can be addressed by tuning the <<pipeline-stats,individual pipelines>> that are saturated.
|
|
|
|
| queue_backpressure |
|
|
This is a unitless metric representing the cumulative time spent by all inputs blocked pushing events into their pipeline's queue, relative to wall-clock time (`queue_push_duration_in_millis` / millisecond).
|
|
It is typically most useful when looking at the stats for an <<pipeline-stats,individual pipeline>>.
|
|
|
|
While a "zero" value indicates no back-pressure to the queue, the magnitude of this metric is highly dependent on the _shape_ of the pipelines and their inputs.
|
|
It cannot be used to compare one pipeline to another or even one process to _itself_ if the quantity or shape of its pipelines changes.
|
|
A pipeline with only one single-threaded input may contribute up to 1.00, a pipeline whose inputs have hundreds of inbound connections may contribute much higher numbers to this combined value.
|
|
|
|
Additionally, some amount of back-pressure is both _normal_ and _expected_ for pipelines that are _pulling_ data, as this back-pressure allows them to slow down and pull data at a rate its downstream pipeline can tolerate.
|
|
|===
|
|
|
|
Each flow stat includes rates for one or more recent windows of time:
|
|
|
|
// Templates for short-hand notes in the table below
|
|
:flow-stable: pass:quotes[*Stable*]
|
|
:flow-preview: pass:quotes[_Technology Preview_]
|
|
|
|
[%autowidth.stretch, cols="1m,2,4"]
|
|
|===
|
|
| Flow Window | Availability | Definition
|
|
|
|
| current | {flow-stable} | the most recent ~10s
|
|
| lifetime | {flow-stable} | the lifetime of the relevant pipeline or process
|
|
| last_1_minute | {flow-preview} | the most recent ~1 minute
|
|
| last_5_minutes | {flow-preview} | the most recent ~5 minutes
|
|
| last_15_minutes | {flow-preview} | the most recent ~15 minutes
|
|
| last_1_hour | {flow-preview} | the most recent ~1 hour
|
|
| last_24_hours | {flow-preview} | the most recent ~24 hours
|
|
|
|
|===
|
|
|
|
NOTE: The flow rate windows marked as "Technology Preview" are subject to change without notice.
|
|
Future releases of {ls} may include more, fewer, or different windows for each rate in response to community feedback.
|
|
|
|
[discrete]
|
|
[[pipeline-stats]]
|
|
==== Pipeline stats
|
|
|
|
The following request returns a JSON document containing pipeline stats,
|
|
including:
|
|
|
|
* the number of events that were input, filtered, or output by each pipeline
|
|
* the current and lifetime <<flow-stats,_flow_ rates>> for each pipeline
|
|
* stats for each configured filter or output stage
|
|
* info about config reload successes and failures
|
|
(when <<reloading-config,config reload>> is enabled)
|
|
* info about the persistent queue (when <<persistent-queues,persistent queues>> are enabled)
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"events" : {
|
|
"duration_in_millis" : 365495,
|
|
"in" : 216610,
|
|
"filtered" : 216485,
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 342466
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 603.1,
|
|
"lifetime" : 575.4
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 604.2,
|
|
"lifetime" : 575.1
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 604.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0.214,
|
|
"lifetime" : 0.937
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 0.941,
|
|
"lifetime" : 0.9709
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 93.092,
|
|
"lifetime" : 92.187
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
|
|
"events" : {
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 342466
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 603.1,
|
|
"lifetime" : 590.7
|
|
}
|
|
},
|
|
"name" : "beats"
|
|
} ],
|
|
"filters" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
|
|
"events" : {
|
|
"duration_in_millis" : 55969,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"failures" : 216485,
|
|
"patterns_per_field" : {
|
|
"message" : 1
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 16.71,
|
|
"lifetime" : 15.27
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 2829,
|
|
"lifetime" : 0.2585
|
|
}
|
|
},
|
|
"name" : "grok"
|
|
}, {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
|
|
"events" : {
|
|
"duration_in_millis" : 3326,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 1.042,
|
|
"lifetime" : 0.9076
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.01763,
|
|
"lifetime" : 0.01536
|
|
}
|
|
},
|
|
"name" : "geoip"
|
|
} ],
|
|
"outputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
|
|
"events" : {
|
|
"duration_in_millis" : 278557,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 75.34,
|
|
"lifetime" : 76.01
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 1.276,
|
|
"lifetime" : 1.287
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue" : {
|
|
"type" : "memory"
|
|
}
|
|
},
|
|
"test2" : {
|
|
"events" : {
|
|
"duration_in_millis" : 2222229,
|
|
"in" : 87247,
|
|
"filtered" : 87247,
|
|
"out" : 87247,
|
|
"queue_push_duration_in_millis" : 1532
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 301.7,
|
|
"lifetime" : 231.8
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 207.2,
|
|
"lifetime" : 231.8
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 207.2,
|
|
"lifetime" : 231.8
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0.735,
|
|
"lifetime" : 0.0006894
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 8.0,
|
|
"lifetime" : 5.903
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 100,
|
|
"lifetime" : 75.8
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-1",
|
|
"events" : {
|
|
"out" : 87247,
|
|
"queue_push_duration_in_millis" : 1532
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 301.7,
|
|
"lifetime" : 238.1
|
|
}
|
|
},
|
|
"name" : "twitter"
|
|
} ],
|
|
"filters" : [ ],
|
|
"outputs" : [ {
|
|
"id" : "d7ea8941c0fc48ac58f89c84a9da482107472b82-2",
|
|
"events" : {
|
|
"duration_in_millis" : 2222229,
|
|
"in" : 87247,
|
|
"out" : 87247
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 100,
|
|
"lifetime" : 75.8
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 33.6,
|
|
"lifetime" : 25.47
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue" : {
|
|
"type" : "memory"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can see the stats for a specific pipeline by including the pipeline ID. In
|
|
the following example, the ID of the pipeline is `test`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/pipelines/test?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"pipelines" : {
|
|
"test" : {
|
|
"events" : {
|
|
"duration_in_millis" : 365495,
|
|
"in" : 216485,
|
|
"filtered" : 216485,
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 2283
|
|
},
|
|
"flow" : {
|
|
"input_throughput" : {
|
|
"current" : 871.3,
|
|
"lifetime" : 575.1
|
|
},
|
|
"filter_throughput" : {
|
|
"current" : 874.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"output_throughput" : {
|
|
"current" : 874.8,
|
|
"lifetime" : 575.1
|
|
},
|
|
"queue_backpressure" : {
|
|
"current" : 0,
|
|
"lifetime" : 0.006246
|
|
},
|
|
"worker_concurrency" : {
|
|
"current" : 1.471,
|
|
"lifetime" : 0.9709
|
|
},
|
|
"worker_utilization" : {
|
|
"current" : 74.54,
|
|
"lifetime" : 46.10
|
|
},
|
|
"queue_persisted_growth_bytes" : {
|
|
"current" : 8731,
|
|
"lifetime" : 0.0106
|
|
},
|
|
"queue_persisted_growth_events" : {
|
|
"current" : 0.0,
|
|
"lifetime" : 0.0
|
|
}
|
|
},
|
|
"plugins" : {
|
|
"inputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
|
|
"events" : {
|
|
"out" : 216485,
|
|
"queue_push_duration_in_millis" : 2283
|
|
},
|
|
"flow" : {
|
|
"throughput" : {
|
|
"current" : 871.3,
|
|
"lifetime" : 590.7
|
|
}
|
|
},
|
|
"name" : "beats"
|
|
} ],
|
|
"filters" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
|
|
"events" : {
|
|
"duration_in_millis" : 55969,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"failures" : 216485,
|
|
"patterns_per_field" : {
|
|
"message" : 1
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 10.53,
|
|
"lifetime" : 7.636
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.3565,
|
|
"lifetime" : 0.2585
|
|
}
|
|
},
|
|
"name" : "grok"
|
|
}, {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
|
|
"events" : {
|
|
"duration_in_millis" : 3326,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"name" : "geoip",
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 1.743,
|
|
"lifetime" : 0.4538
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 0.0590,
|
|
"lifetime" : 0.01536
|
|
}
|
|
}
|
|
} ],
|
|
"outputs" : [ {
|
|
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
|
|
"events" : {
|
|
"duration_in_millis" : 278557,
|
|
"in" : 216485,
|
|
"out" : 216485
|
|
},
|
|
"flow" : {
|
|
"worker_utilization" : {
|
|
"current" : 62.27,
|
|
"lifetime" : 38.01
|
|
},
|
|
"worker_millis_per_event" : {
|
|
"current" : 2.109,
|
|
"lifetime" : 1.287
|
|
}
|
|
},
|
|
"name" : "elasticsearch"
|
|
} ]
|
|
},
|
|
"reloads" : {
|
|
"last_error" : null,
|
|
"successes" : 0,
|
|
"last_success_timestamp" : null,
|
|
"last_failure_timestamp" : null,
|
|
"failures" : 0
|
|
},
|
|
"queue": {
|
|
"type" : "persisted",
|
|
"capacity": {
|
|
"max_unread_events": 0,
|
|
"page_capacity_in_bytes": 67108864,
|
|
"max_queue_size_in_bytes": 1073741824,
|
|
"queue_size_in_bytes": 3885
|
|
},
|
|
"data": {
|
|
"path": "/pipeline/queue/path",
|
|
"free_space_in_bytes": 936886480896,
|
|
"storage_type": "apfs"
|
|
},
|
|
"events": 0,
|
|
"events_count": 0,
|
|
"queue_size_in_bytes": 3885,
|
|
"max_queue_size_in_bytes": 1073741824
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[pipeline-flow-rates]]
|
|
===== Pipeline flow rates
|
|
|
|
Each pipeline's entry in the API response includes a number of pipeline-scoped <<flow-stats,_flow_ rates>> such as `input_throughput`, `worker_concurrency`, and `queue_backpressure` to provide visibility into the flow of events through the pipeline.
|
|
|
|
When configured with a <<persistent-queues,persistent queue>>, the pipeline's `flow` will include additional rates to provide visibility into the health of the pipeline's persistent queue:
|
|
|
|
[%autowidth.stretch, cols="1m,4"]
|
|
|===
|
|
|Flow Rate | Definition
|
|
|
|
| queue_persisted_growth_events |
|
|
This metric is expressed in events-per-second, and is the rate of change of the number of unacknowleged events in the queue, relative to wall-clock time (`queue.events_count` / second).
|
|
A positive number indicates that the queue's event-count is growing, and a negative number indicates that the queue is shrinking.
|
|
|
|
| queue_persisted_growth_bytes |
|
|
This metric is expressed in bytes-per-second, and is the rate of change of the size of the persistent queue on disk, relative to wall-clock time (`queue.queue_size_in_bytes` / second).
|
|
A positive number indicates that the queue size-on-disk is growing, and a negative number indicates that the queue is shrinking.
|
|
|
|
NOTE: The size of a PQ on disk includes both unacknowledged events and previously-acknowledged events from pages that contain one or more unprocessed events.
|
|
This means it grows gradually as individual events are added, but shrinks in large chunks each time a whole page of processed events is reclaimed (read more: <<garbage-collection, PQ disk garbage collection>>).
|
|
|
|
| worker_utilization |
|
|
This is a unitless metric that indicates the percentage of available worker time being used by this individual plugin (`duration` / (`uptime` * `pipeline.workers`).
|
|
It is useful for identifying which plugins in a pipeline are using the available worker resources.
|
|
|
|
A _pipeline_ is considered "saturated" when `worker_utilization` approaches 100, because it indicates that all of its workers are being kept busy.
|
|
This is typically an indication of either downstream back-pressure or insufficient resources allocated to the pipeline.
|
|
Tuning a saturated pipeline to have more workers can often work to increase that pipeline's throughput and decrease back-pressure to its queue, unless the pipeline is experiencing back-pressure from its outputs.
|
|
|
|
A _pipeline_ is considered "starved" when `worker_utilization` approaches 0, because it indicates that none of its workers are being kept busy.
|
|
This is typically an indication that the inputs are not receiving or retrieving enough volume to keep the pipeline workers busy.
|
|
Tuning a starved pipeline to have fewer workers can help it to consume less memory and CPU, freeing up resources for other pipelines.
|
|
|===
|
|
|
|
[discrete]
|
|
[[plugin-flow-rates]]
|
|
===== Plugin flow rates
|
|
|
|
Several additional plugin-level flow rates are available, and can be helpful for identifying problems with individual plugins:
|
|
|
|
[%autowidth.stretch, cols="2m,1,4"]
|
|
|===
|
|
| Flow Rate | Plugin Types | Definition
|
|
|
|
| throughput | Inputs |
|
|
This metric is expressed in events-per-second, and is the rate of events this input plugin is pushing into the pipeline's queue relative to wall-clock time (`events.in` / `second`).
|
|
It includes events that are blocked by the queue and have not yet been accepted.
|
|
|
|
| worker_utilization | Filters, Outputs |
|
|
This is a unitless metric that indicates the percentage of available worker time being used by this individual plugin (`duration` / (`uptime` * `pipeline.workers`).
|
|
It is useful for identifying which plugins in a pipeline are using the available worker resources.
|
|
|
|
| worker_millis_per_event | Filters, Outputs |
|
|
This metric is expressed in worker-millis-spent-per-event (`duration_in_millis` / `events.in`) with higher scores indicating more resources spent per event.
|
|
It is especially useful for identifying issues with plugins that operate on a small subset of events.
|
|
An `"Infinity"` value for a given flow window indicates that worker millis have been spent without any events completing processing; this can indicate a plugin that is either stuck or handling only empty batches.
|
|
|
|
|===
|
|
[discrete]
|
|
[[reload-stats]]
|
|
==== Reload stats
|
|
|
|
The following request returns a JSON document that shows info about config reload successes and failures.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/reloads?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"reloads": {
|
|
"successes": 0,
|
|
"failures": 0
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[os-stats]]
|
|
==== OS stats
|
|
|
|
When Logstash is running in a container, the following request returns a JSON document that
|
|
contains cgroup information to give you a more accurate view of CPU load, including whether
|
|
the container is being throttled.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/os?pretty'
|
|
--------------------------------------------------
|
|
|
|
Example response:
|
|
|
|
[source,json]
|
|
--------------------------------------------------
|
|
{
|
|
"os" : {
|
|
"cgroup" : {
|
|
"cpuacct" : {
|
|
"control_group" : "/elastic1",
|
|
"usage_nanos" : 378477588075
|
|
},
|
|
"cpu" : {
|
|
"control_group" : "/elastic1",
|
|
"cfs_period_micros" : 1000000,
|
|
"cfs_quota_micros" : 800000,
|
|
"stat" : {
|
|
"number_of_elapsed_periods" : 4157,
|
|
"number_of_times_throttled" : 460,
|
|
"time_throttled_nanos" : 581617440755
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
[[geoip-database-stats]]
|
|
==== Geoip database stats
|
|
|
|
You can monitor stats for the geoip databases used with the <<plugins-filters-geoip, Geoip filter plugin>>.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/stats/geoip_download_manager?pretty'
|
|
--------------------------------------------------
|
|
|
|
For more info, see <<plugins-filters-geoip-metrics,Database Metrics>> in the Geoip filter plugin docs.
|
|
|
|
[[hot-threads-api]]
|
|
=== Hot Threads API
|
|
|
|
The hot threads API gets the current hot threads for Logstash. A hot thread is a
|
|
Java thread that has high CPU usage and executes for a longer than normal period
|
|
of time.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/hot_threads?pretty'
|
|
--------------------------------------------------
|
|
|
|
The output is a JSON document that contains a breakdown of the top hot threads for
|
|
Logstash.
|
|
|
|
Example response:
|
|
|
|
[source,json,subs="attributes"]
|
|
--------------------------------------------------
|
|
{
|
|
"hot_threads" : {
|
|
"time" : "2017-06-06T18:25:28-07:00",
|
|
"busiest_threads" : 3,
|
|
"threads" : [ {
|
|
"name" : "Ruby-0-Thread-7",
|
|
"percent_of_cpu_time" : 0.0,
|
|
"state" : "timed_waiting",
|
|
"path" : "/path/to/logstash-{logstash_version}/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:187",
|
|
"traces" : [ "java.lang.Object.wait(Native Method)", "org.jruby.RubyThread.sleep(RubyThread.java:1002)", "org.jruby.RubyKernel.sleep(RubyKernel.java:803)" ]
|
|
}, {
|
|
"name" : "[test2]>worker3",
|
|
"percent_of_cpu_time" : 0.85,
|
|
"state" : "waiting",
|
|
"traces" : [ "sun.misc.Unsafe.park(Native Method)", "java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)", "java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)" ]
|
|
}, {
|
|
"name" : "[test2]>worker2",
|
|
"percent_of_cpu_time" : 0.85,
|
|
"state" : "runnable",
|
|
"traces" : [ "org.jruby.RubyClass.allocate(RubyClass.java:225)", "org.jruby.RubyClass.newInstance(RubyClass.java:856)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)" ]
|
|
} ]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The parameters allowed are:
|
|
|
|
[horizontal]
|
|
`threads`:: The number of hot threads to return. The default is 10.
|
|
`stacktrace_size`:: The depth of the stack trace to report for each thread. The default is 50.
|
|
`human`:: If true, returns plain text instead of JSON format. The default is false.
|
|
`ignore_idle_threads`:: If true, does not return idle threads. The default is true.
|
|
|
|
See <<monitoring-common-options, Common Options>> for a list of options that can be applied to all
|
|
Logstash monitoring APIs.
|
|
|
|
You can use the `?human` parameter to return the document in a human-readable format.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_node/hot_threads?human=true'
|
|
--------------------------------------------------
|
|
|
|
Example of a human-readable response:
|
|
|
|
[source%nowrap,text,subs="attributes"]
|
|
--------------------------------------------------
|
|
::: {}
|
|
Hot threads at 2017-06-06T18:31:17-07:00, busiestThreads=3:
|
|
================================================================================
|
|
0.0 % of cpu usage, state: timed_waiting, thread name: 'Ruby-0-Thread-7'
|
|
/path/to/logstash-{logstash_version}/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:187
|
|
java.lang.Object.wait(Native Method)
|
|
org.jruby.RubyThread.sleep(RubyThread.java:1002)
|
|
org.jruby.RubyKernel.sleep(RubyKernel.java:803)
|
|
--------------------------------------------------------------------------------
|
|
0.0 % of cpu usage, state: waiting, thread name: 'defaultEventExecutorGroup-5-4'
|
|
sun.misc.Unsafe.park(Native Method)
|
|
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
|
|
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
|
|
--------------------------------------------------------------------------------
|
|
0.05 % of cpu usage, state: timed_waiting, thread name: '[test]-pipeline-manager'
|
|
java.lang.Object.wait(Native Method)
|
|
java.lang.Thread.join(Thread.java:1253)
|
|
org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75)
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
[[logstash-health-report-api]]
|
|
=== Health report API
|
|
|
|
An API that reports the health status of Logstash.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9600/_health_report?pretty'
|
|
--------------------------------------------------
|
|
|
|
==== Description
|
|
|
|
The health API returns a report with the health status of Logstash and the pipelines that are running inside of it.
|
|
The report contains a list of indicators that compose Logstash functionality.
|
|
|
|
Each indicator has a health status of: `green`, `unknown`, `yellow`, or `red`.
|
|
The indicator will provide an explanation and metadata describing the reason for its current health status.
|
|
|
|
The top-level status is controlled by the worst indicator status.
|
|
|
|
In the event that an indicator's status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue.
|
|
Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
|
|
|
|
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system.
|
|
The root cause and remediation steps are encapsulated in a `diagnosis`.
|
|
A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, and the URL for detailed troubleshooting help.
|
|
|
|
NOTE: The health indicators perform root cause analysis of non-green health statuses.
|
|
This can be computationally expensive when called frequently.
|
|
|
|
==== Response body
|
|
|
|
`status`::
|
|
(Optional, string) Health status of {ls}, based on the aggregated status of all indicators. Statuses are:
|
|
|
|
`green`:::
|
|
{ls} is healthy.
|
|
|
|
`unknown`:::
|
|
The health of {ls} could not be determined.
|
|
|
|
`yellow`:::
|
|
The functionality of {ls} is in a degraded state and may need remediation to avoid the health becoming `red`.
|
|
|
|
`red`:::
|
|
{ls} is experiencing an outage or certain features are unavailable for use.
|
|
|
|
`indicators`::
|
|
(object) Information about the health of the {ls} indicators.
|
|
|
|
+
|
|
.Properties of `indicators`
|
|
[%collapsible%open]
|
|
====
|
|
`<indicator>`::
|
|
(object) Contains health results for an indicator.
|
|
+
|
|
.Properties of `<indicator>`
|
|
[%collapsible%open]
|
|
=======
|
|
`status`::
|
|
(string) Health status of the indicator. Statuses are:
|
|
|
|
`green`:::
|
|
The indicator is healthy.
|
|
|
|
`unknown`:::
|
|
The health of the indicator could not be determined.
|
|
|
|
`yellow`:::
|
|
The functionality of an indicator is in a degraded state and may need remediation to avoid the health becoming `red`.
|
|
|
|
`red`:::
|
|
The indicator is experiencing an outage or certain features are unavailable for use.
|
|
|
|
`symptom`::
|
|
(string) A message providing information about the current health status.
|
|
|
|
`details`::
|
|
(Optional, object) An object that contains additional information about the indicator that has lead to the current health status result.
|
|
Each indicator has <<logstash-health-api-response-details, a unique set of details>>.
|
|
|
|
`impacts`::
|
|
(Optional, array) If a non-healthy status is returned, indicators may include a list of impacts that this health status will have on {ls}.
|
|
+
|
|
.Properties of `impacts`
|
|
[%collapsible%open]
|
|
========
|
|
`severity`::
|
|
(integer) How important this impact is to the functionality of {ls}.
|
|
A value of 1 is the highest severity, with larger values indicating lower severity.
|
|
|
|
`description`::
|
|
(string) A description of the impact on {ls}.
|
|
|
|
`impact_areas`::
|
|
(array of strings) The areas {ls} functionality that this impact affects.
|
|
Possible values are:
|
|
+
|
|
--
|
|
* `pipeline_execution`
|
|
--
|
|
|
|
========
|
|
|
|
`diagnosis`::
|
|
(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
|
|
+
|
|
.Properties of `diagnosis`
|
|
[%collapsible%open]
|
|
========
|
|
`cause`::
|
|
(string) A description of a root cause of this health problem.
|
|
|
|
`action`::
|
|
(string) A brief description the steps that should be taken to remediate the problem.
|
|
A more detailed step-by-step guide to remediate the problem is provided by the `help_url` field.
|
|
|
|
`help_url`::
|
|
(string) A link to the troubleshooting guide that'll fix the health problem.
|
|
========
|
|
=======
|
|
====
|
|
|
|
[role="child_attributes"]
|
|
[[logstash-health-api-response-details]]
|
|
==== Indicator Details
|
|
|
|
Each health indicator in the health API returns a set of details that further explains the state of the system.
|
|
The details have contents and a structure that is unique to each indicator.
|
|
|
|
[[logstash-health-api-response-details-pipeline]]
|
|
===== Pipeline Indicator Details
|
|
|
|
`pipelines/indicators/<pipeline_id>/details`::
|
|
(object) Information about the specified pipeline.
|
|
+
|
|
.Properties of `pipelines/indicators/<pipeline_id>/details`
|
|
[%collapsible%open]
|
|
====
|
|
`status`::
|
|
(object) Details related to the pipeline's current status and run-state.
|
|
+
|
|
.Properties of `status`
|
|
[%collapsible%open]
|
|
========
|
|
`state`::
|
|
(string) The current state of the pipeline, including whether it is `loading`, `running`, `finished`, or `terminated`.
|
|
========
|
|
====
|