[[troubleshooting]] == Troubleshooting Common Problems We will be adding more tips and solutions, so please check back soon. Also check out the https://discuss.elastic.co/c/logstash[Logstash discussion forum]. [float] [[ts-install]] == Installation and setup [float] [[ts-temp-dir]] === Inaccessible temp directory Certain versions of the JRuby runtime and libraries in certain plugins (the Netty network library in the TCP input, for example) copy executable files to the temp directory. This situation causes subsequent failures when `/tmp` is mounted `noexec`. *Possible solutions* * Change setting to mount `/tmp` with `exec`. * Specify an alternate directory using the `-Djava.io.tmpdir` setting in the `jvm.options` file. [float] [[ts-ingest]] == Data ingestion [float] [[ts-429]] === Error response code 429 A `429` message indicates that an application is busy handling other requests. For example, Elasticsearch throws a `429` code to notify Logstash (or other indexers) that the bulk failed because the ingest queue is full. Any documents that weren't processed should be retried. TBD: Does Logstash retry? Should the user take any action? *Sample error* ----- [2018-08-21T20:05:36,111][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@85be457 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@538c9d8a[Running, pool size = 16, active threads = 16, queued tasks = 200, completed tasks = 685]]"}) ----- [float] [[ts-performance]] == General performance tuning For general performance tuning tips and guidelines, see <>. [float] [[ts-kafka]] == Common Kafka support issues and solutions This section contains a list of common Kafka issues from the last few months. [float] [[ts-kafka-timeout]] === Kafka session timeout issues (input side) This is a common problem. *Symptoms* Throughput issues and duplicate event processing {ls} logs warnings: ----- [2017-10-18T03:37:59,302][WARN][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Auto offset commit failed for group clap_tx1: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. ----- The time between subsequent calls to `poll()` was longer than the configured `session.timeout.ms`, which typically implies that the poll loop is spending too much time processing messages. You can address this by increasing the session timeout or by reducing the maximum size of batches returned in `poll()` with `max.poll.records`. ----- [INFO][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Revoking previously assigned partitions [] for group log-ronline-node09 `[2018-01-29T14:54:06,485][INFO]`[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Setting newly assigned partitions [elk-pmbr-9] for group log-pmbr ----- *Background* Kafka tracks the individual consumers in a consumer group (for example, a number of {ls} instances) and tries to give each consumer one or more specific partitions of data in the topic they’re consuming. In order to achieve this, Kafka tracks whether or not a consumer ({ls} Kafka input thread) is making progress on their assigned partition, and reassigns partitions that have not made progress in a set timeframe. When {ls} requests more events from the Kafka Broker than it can process within the timeout, it triggers reassignment of partitions. Reassignment of partitions takes time, and can cause duplicate processing of events and significant throughput problems. *Possible solutions* * Reduce the number of records per request that {ls} polls from the Kafka Broker in one request, * Reduce the number of Kafka input threads, and/or * Increase the relevant timeouts in the Kafka Consumer configuration. *Details* The `max_poll_records` option sets the number of records to be pulled in one request. If it exceeds the default value of 500, try reducing it. The `consumer_threads` option sets the number of input threads. If the value exceeds the number of pipeline workers configured in the `logstash.yml` file, it should certainly be reduced. If the value is greater than 4, try reducing it to `4` or less if the client has the time/resources for it. Try starting with a value of `1`, and then incrementing from there to find the optimal performance. The `session_timeout_ms` option sets the relevant timeout. Set it to a value that ensures that the number of events in `max_poll_records` can be safely processed within the time limit. ----- EXAMPLE Pipeline throughput is `10k/s` and `max_poll_records` is set to 1k =>. The value must be at least 100ms if `consumer_threads` is set to `1`. If it is set to a higher value `n`, then the minimum session timeout increases proportionally to `n * 100ms`. ----- In practice the value must be set much higher than the theoretical value because the behavior of the outputs and filters in a pipeline follows a distribution. The value should also be higher than the maximum time you expect your outputs to stall. The default setting is `10s == 10000ms`. If you are experiencing periodic problems with an output that can stall because of load or similar effects (such as the Elasticsearch output), there is little downside to increasing this value significantly to say `60s`. From a performance perspective, decreasing the `max_poll_records` value is preferable to increasing the timeout value. Increasing the timeout is your only option if the client’s issues are caused by periodically stalling outputs. Check logs for evidence of stalling outputs, such as `ES output logging status 429`. [float] [[ts-kafka-many-offset-commits]] === Large number of offset commits (input side) *Symptoms* Logstash’s Kafka Input is causing a much higher number of commits to the offset topic than expected. Often the complaint also mentions redundant offset commits where the same offset is committed repeatedly. *Solution* For Kafka Broker versions 0.10.2.1 to 1.0.x: The problem is caused by a bug in Kafka. https://issues.apache.org/jira/browse/KAFKA-6362 The client’s best option is upgrading their Kafka Brokers to version 1.1 or newer. For older versions of Kafka or if the above does not fully resolve the issue: The problem can also be caused by setting the value for `poll_timeout_ms` too low relative to the rate at which the Kafka Brokers receive events themselves (or if Brokers periodically idle between receiving bursts of events). Increasing the value set for `poll_timeout_ms` proportionally decreases the number of offsets commits in this scenario. For example, raising it by 10x will lead to 10x fewer offset commits. [float] [[ts-kafka-codec-errors-input]] === Codec Errors in Kafka Input (before Plugin Version 6.3.4 only) *Symptoms* Logstash Kafka input randomly logs errors from the configured codec and/or reads events incorrectly (partial reads, mixing data between multiple events etc.). ----- Log example: [2018-02-05T13:51:25,773][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#, :backtrace=>["org/jruby/RubyArray.java:1892:in `join'", "org/jruby/RubyArray.java:1898:in `join'", "/usr/share/logstash/logstash-core/lib/logstash/util/buftok.rb:87:in `extract'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:38:in `decode'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.1.11/lib/logstash/inputs/kafka.rb:241:in `thread_runner'", "file:/usr/share/logstash/vendor/jruby/lib/jruby.jar!/jruby/java/java_ext/java.lang.rb:12:in `each'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.1.11/lib/logstash/inputs/kafka.rb:240:in `thread_runner'"]} ----- *Background* There was a bug in the way the Kafka Input plugin was handling codec instances when running on multiple threads (`consumer_threads` set to > 1). https://github.com/logstash-plugins/logstash-input-kafka/issues/210 *Solution* * Upgrade Kafka Input plugin to v. 6.3.4 or later. * If (and only if) upgrading is impossible, set `consumer_threads` to `1`. [float] [[ts-other]] == Other issues [float] [[ts-cli]] === Command line [float] [[ts-windows-cli]] ==== Shell commands on Windows OS Command line often show single quotes. On Windows systems, replace a single quote `'' with a double quote `"`. *Example* Instead of: ----- bin/logstash -e 'input { stdin { } } output { stdout {} }' ----- Use this format on Windows systems: ----- bin/logstash -e "input { stdin { } } output { stdout {} }" -----