logstash/docs/static/troubleshooting.asciidoc
Karen Metts 2fb348e053
[DOC] Logging settings can affect performances (#12246) (#12247)
Logging settings can affect performances
Backports #12246 to 7.x branch

Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>
2020-09-15 16:51:29 -04:00

351 lines
12 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[troubleshooting]]
== Troubleshooting Common Problems
We are adding more troubleshooting tips, so please check back soon. If you
have something to add, please:
* create an issue at
https://github.com/elastic/logstash/issues, or
* create a pull request with your proposed changes at https://github.com/elastic/logstash.
Also check out the https://discuss.elastic.co/c/logstash[Logstash discussion
forum].
[float]
[[ts-install]]
== Installation and setup
[float]
[[ts-temp-dir]]
=== Inaccessible temp directory
Certain versions of the JRuby runtime and libraries
in certain plugins (the Netty network library in the TCP input, for example) copy
executable files to the temp directory. This situation causes subsequent failures when
`/tmp` is mounted `noexec`.
*Sample error*
[source,sh]
-----
[2018-03-25T12:23:01,149][ERROR][org.logstash.Logstash ]
java.lang.IllegalStateException: org.jruby.exceptions.RaiseException:
(LoadError) Could not load FFI Provider: (NotImplementedError) FFI not
available: java.lang.UnsatisfiedLinkError: /tmp/jffi5534463206038012403.so:
/tmp/jffi5534463206038012403.so: failed to map segment from shared object:
Operation not permitted
-----
*Possible solutions*
* Change setting to mount `/tmp` with `exec`.
* Specify an alternate directory using the `-Djava.io.tmpdir` setting in the `jvm.options` file.
[float]
[[ts-startup]]
== {ls} start up
[float]
[[ts-illegal-reflective-error]]
=== 'Illegal reflective access' errors
// https://github.com/elastic/logstash/issues/10496 and https://github.com/elastic/logstash/issues/10498
Running Logstash with Java 11 results in warnings similar to these:
[source,sh]
-----
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.jruby.ext.openssl.SecurityHelper (file:/{...}/jruby{...}jopenssl.jar) to field java.security.MessageDigest.provider
WARNING: Please consider reporting this to the maintainers of org.jruby.ext.openssl.SecurityHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
-----
These errors appear related to https://github.com/jruby/jruby/issues/4834[a known issue with JRuby].
*Work around*
Try adding these values to the `jvm.options` file.
[source,sh]
-----
--add-opens=java.base/java.security=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.nio.channels=org.jruby.dist
--add-opens=java.base/sun.nio.ch=org.jruby.dist
--add-opens=java.management/sun.management=org.jruby.dist
-----
*Notes:*
* These settings allow Logstash to start without warnings in Java 11, but they
prevent Logstash from starting on Java 8.
* This workaround has been tested with simple pipelines. If you have experiences
to share, please comment in the
https://github.com/elastic/logstash/issues/10496[issue].
[float]
[[ts-ingest]]
== Data ingestion
[float]
[[ts-429]]
=== Error response code 429
A `429` message indicates that an application is busy handling other requests. For
example, Elasticsearch sends a `429` code to notify Logstash (or other indexers)
that the bulk failed because the ingest queue is full. Logstash will retry sending documents.
*Possible actions*
Check {es} to see if it needs attention.
* {ref}/cluster-stats.html[Cluster stats API]
* {ref}/monitor-elasticsearch-cluster.html[Monitor a cluster]
*Sample error*
-----
[2018-08-21T20:05:36,111][INFO ][logstash.outputs.elasticsearch] retrying
failed action with response code: 429
({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of
org.elasticsearch.transport.TransportService$7@85be457 on
EsThreadPoolExecutor[bulk, queue capacity = 200,
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@538c9d8a[Running,
pool size = 16, active threads = 16, queued tasks = 200, completed tasks =
685]]"})
-----
[float]
[[ts-performance]]
== General performance tuning
For general performance tuning tips and guidelines, see <<performance-tuning>>.
[float]
[[ts-pipeline]]
== Troubleshooting a pipeline
Pipelines, by definition, are unique. Here are some guidelines to help you get
started.
* Identify the offending pipeline.
* Start small. Create a minimum pipeline that manifests the problem.
For basic pipelines, this configuration could be enough to make the problem show itself.
[source,ruby]
-----
input {stdin{}} output {stdout{}}
-----
{ls} can separate logs by pipeline. This feature can help you identify the offending pipeline.
Set `pipeline.separate_logs: true` in your `logstash.yml` to enable the log per pipeline feature.
For more complex pipelines, the problem could be caused by a series of plugins in
a specific order. Troubleshooting these pipelines usually requires trial and error.
Start by systematically removing input and output plugins until you're left with
the minimum set that manifest the issue.
We want to expand this section to make it more helpful. If you have
troubleshooting tips to share, please:
* create an issue at https://github.com/elastic/logstash/issues, or
* create a pull request with your proposed changes at https://github.com/elastic/logstash.
[float]
[[ts-pipeline-logging-level-performance]]
=== Logging level can affect performances
*Symptoms*
Simple filters such as `mutate` or `json` filter can take several milliseconds per event to execute.
Inputs and outputs might be affected, too.
*Background*
The different plugins running on Logstash can be quite verbose if the logging level is set to `debug` or `trace`.
As the logging library used in Logstash is synchronous, heavy logging can affect performances.
*Solution*
Reset the logging level to `info`.
[float]
[[ts-kafka]]
== Common Kafka support issues and solutions
[float]
[[ts-kafka-timeout]]
=== Kafka session timeout issues (input side)
*Symptoms*
Throughput issues and duplicate event processing {ls} logs warnings:
-----
[2017-10-18T03:37:59,302][WARN][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator]
Auto offset commit failed for group clap_tx1: Commit cannot be completed since
the group has already rebalanced and assigned the partitions to another member.
-----
The time between subsequent calls to `poll()` was longer than the
configured `session.timeout.ms`, which typically implies that the poll loop is
spending too much time processing messages. You can address this by
increasing the session timeout or by reducing the maximum size of batches
returned in `poll()` with `max.poll.records`.
-----
[INFO][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Revoking
previously assigned partitions [] for group log-ronline-node09
`[2018-01-29T14:54:06,485][INFO]`[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator]
Setting newly assigned partitions [elk-pmbr-9] for group log-pmbr
-----
*Background*
Kafka tracks the individual consumers in a consumer group (for example, a number
of {ls} instances) and tries to give each consumer one or more specific
partitions of data in the topic theyre consuming. In order to achieve this,
Kafka tracks whether or not a consumer ({ls} Kafka input thread) is making
progress on their assigned partition, and reassigns partitions that have not
made progress in a set timeframe.
When {ls} requests more events from the Kafka Broker than it can process within
the timeout, it triggers reassignment of partitions. Reassignment of partitions
takes time, and can cause duplicate processing of events and significant
throughput problems.
*Possible solutions*
* Reduce the number of records per request that {ls} polls from the Kafka Broker in one request,
* Reduce the number of Kafka input threads, and/or
* Increase the relevant timeouts in the Kafka Consumer configuration.
*Details*
The `max_poll_records` option sets the number of records to be pulled in one request.
If it exceeds the default value of 500, try reducing it.
The `consumer_threads` option sets the number of input threads. If the value exceeds
the number of pipeline workers configured in the `logstash.yml` file, it should
certainly be reduced.
If the value is greater than 4, try reducing it to `4` or less if the client has
the time/resources for it. Try starting with a value of `1`, and then
incrementing from there to find the optimal performance.
The `session_timeout_ms` option sets the relevant timeout. Set it to a value
that ensures that the number of events in `max_poll_records` can be safely
processed within the time limit.
-----
EXAMPLE
Pipeline throughput is `10k/s` and `max_poll_records` is set to 1k =>. The value
must be at least 100ms if `consumer_threads` is set to `1`. If it is set to a
higher value `n`, then the minimum session timeout increases proportionally to
`n * 100ms`.
-----
In practice the value must be set much higher than the theoretical value because
the behavior of the outputs and filters in a pipeline follows a distribution.
The value should also be higher than the maximum time you expect your outputs to
stall. The default setting is `10s == 10000ms`. If you are experiencing
periodic problems with an output that can stall because of load or similar
effects (such as the Elasticsearch output), there is little downside to
increasing this value significantly to say `60s`.
From a performance perspective, decreasing the `max_poll_records` value is preferable
to increasing the timeout value. Increasing the timeout is your only option if the
clients issues are caused by periodically stalling outputs. Check logs for
evidence of stalling outputs, such as `ES output logging status 429`.
[float]
[[ts-kafka-many-offset-commits]]
=== Large number of offset commits (Kafka input side)
*Symptoms*
Logstashs Kafka Input is causing a much higher number of commits to
the offset topic than expected. Often the complaint also mentions redundant
offset commits where the same offset is committed repeatedly.
*Solution*
For Kafka Broker versions 0.10.2.1 to 1.0.x: The problem is caused by a bug in
Kafka. https://issues.apache.org/jira/browse/KAFKA-6362 The clients best option
is upgrading their Kafka Brokers to version 1.1 or newer.
For older versions of
Kafka or if the above does not fully resolve the issue: The problem can also be
caused by setting the value for `poll_timeout_ms` too low relative to the rate
at which the Kafka Brokers receive events themselves (or if Brokers periodically
idle between receiving bursts of events). Increasing the value set for
`poll_timeout_ms` proportionally decreases the number of offsets commits in
this scenario. For example, raising it by 10x will lead to 10x fewer offset commits.
[float]
[[ts-kafka-codec-errors-input]]
=== Codec Errors in Kafka Input (before Plugin Version 6.3.4 only)
*Symptoms*
Logstash Kafka input randomly logs errors from the configured codec and/or reads
events incorrectly (partial reads, mixing data between multiple events etc.).
-----
Log example: [2018-02-05T13:51:25,773][FATAL][logstash.runner ] An
unexpected error occurred! {:error=>#<TypeError: can't convert nil into String>,
:backtrace=>["org/jruby/RubyArray.java:1892:in `join'",
"org/jruby/RubyArray.java:1898:in `join'",
"/usr/share/logstash/logstash-core/lib/logstash/util/buftok.rb:87:in `extract'",
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:38:in
`decode'",
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.1.11/lib/logstash/inputs/kafka.rb:241:in
`thread_runner'",
"file:/usr/share/logstash/vendor/jruby/lib/jruby.jar!/jruby/java/java_ext/java.lang.rb:12:in
`each'",
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.1.11/lib/logstash/inputs/kafka.rb:240:in
`thread_runner'"]}
-----
*Background*
There was a bug in the way the Kafka Input plugin was handling codec instances
when running on multiple threads (`consumer_threads` set to > 1).
https://github.com/logstash-plugins/logstash-input-kafka/issues/210
*Solution*
* Upgrade Kafka Input plugin to v. 6.3.4 or later.
* If (and only if) upgrading is not possible, set `consumer_threads` to `1`.
[float]
[[ts-other]]
== Other issues
Coming soon