logstash/docs/static/dead-letter-queues.asciidoc

[[dead-letter-queues]]
=== Dead Letter Queues

NOTE: The dead letter queue feature is currently supported for the
<<plugins-outputs-elasticsearch>> output only. Support for additional outputs
will be available in future releases of the Logstash plugins. Before configuring
Logstash to use this feature, refer to the output plugin documentation to
verify that the plugin supports the dead letter queue feature.

By default, when Logstash encounters an event that it cannot process because the
data contains a mapping error or some other issue, the Logstash pipeline
either hangs or drops the unsuccessful event. In order to protect against data
loss in this situation, you can <<configuring-dlq,configure Logstash>> to write
unsuccessful events to a dead letter queue instead of dropping them.

Each event written to the dead letter queue includes the original event, along
with metadata that describes the reason the event could not be processed,
information about the plugin that wrote the event, and the timestamp for when
the event entered the dead letter queue.

To process events in the dead letter queue, you simply create a Logstash
pipeline configuration that uses the
<<plugins-inputs-dead_letter_queue,`dead_letter_queue` input plugin>> to read
from the queue.

image::static/images/dead_letter_queue.png[Diagram showing pipeline reading from the dead letter queue]

See <<processing-dlq-events>> for more information.

[[configuring-dlq]]
==== Configuring Logstash to Use Dead Letter Queues

Dead letter queues are disabled by default. To enable dead letter queues, set
the `dead_letter_queue_enable` option in the `logstash.yml`
<<logstash-settings-file,settings file>>:

[source,yaml]
-------------------------------------------------------------------------------
dead_letter_queue.enable: true
-------------------------------------------------------------------------------

Dead letter queues are stored as files in the local directory of the Logstash
instance. By default, the dead letter queue files are stored in
`path.data/dead_letter_queue`. Each pipeline has a separate queue. For example,
the dead letter queue for the `main` pipeline is stored in
`LOGSTASH_HOME/data/dead_letter_queue/main` by default. The queue files are
numbered sequentially: `1.log`, `2.log`, and so on.

You can set `path.dead_letter_queue` in the `logstash.yml` file to
specify a different path for the files:

[source,yaml]
-------------------------------------------------------------------------------
path.dead_letter_queue: "path/to/data/dead_letter_queue"
-------------------------------------------------------------------------------


NOTE: You may not use the same `dead_letter_queue` path for two different
Logstash instances.

===== File Rotation

Dead letter queues have a built-in file rotation policy that manages the file
size of the queue. When the file size reaches a preconfigured threshold, a new
file is created automatically.

By default, the maximum size of each dead letter queue is set to 1024mb. To
change this setting, use the `dead_letter_queue.max_bytes` option.  Entries
will be dropped if they would increase the size of the dead letter queue beyond
this setting.

[[processing-dlq-events]]
==== Processing Events in the Dead Letter Queue

When you are ready to process events in the dead letter queue, you create a
pipeline that uses the
<<plugins-inputs-dead_letter_queue,`dead_letter_queue` input plugin>> to read
from the dead letter queue. The pipeline configuration that you use depends, of
course, on what you need to do. For example, if the dead letter queue contains
events that resulted from a mapping error in Elasticsearch, you can create a
pipeline that reads the "dead" events, removes the field that caused the mapping
issue, and re-indexes the clean events into Elasticsearch.

The following example shows a simple pipeline that reads events from the dead
letter queue and writes the events, including metadata, to standard output:

[source,yaml]
--------------------------------------------------------------------------------
input {
  dead_letter_queue {
    path => "/path/to/data/dead_letter_queue" <1>
    commit_offsets => true <2>
    pipeline_id => "main" <3>
  }
}

output {
  stdout {
    codec => rubydebug { metadata => true }
  }
}
--------------------------------------------------------------------------------

<1> The path to the top-level directory containing the dead letter queue. This
directory contains a separate folder for each pipeline that writes to the dead
letter queue. To find the path to this directory, look at the `logstash.yml`
<<logstash-settings-file,settings file>>. By default, Logstash creates the
`dead_letter_queue` directory under the location used for persistent
storage (`path.data`), for example, `LOGSTASH_HOME/data/dead_letter_queue`.
However, if `path.dead_letter_queue` is set, it uses that location instead.
<2> When `true`, saves the offset. When the pipeline restarts, it will continue
reading from the position where it left off rather than reprocessing all the
items in the queue. You can set `commit_offsets` to `false` when you are
exploring events in the dead letter queue and want to iterate over the events
multiple times.
<3> The ID of the pipeline that's writing to the dead letter queue. The default
is `"main"`.

For another example, see <<dlq-example>>.

When the pipeline has finished processing all the events in the dead letter
queue, it will continue to run and process new events as they stream into the
queue. This means that you do not need to stop your production system to handle
events in the dead letter queue.

NOTE: Events emitted from the
<<plugins-inputs-dead_letter_queue,`dead_letter_queue` input plugin>> plugin
will not be resubmitted to the dead letter queue if they cannot be processed
correctly.

[[dlq-timestamp]]
==== Reading From a Timestamp

When you read from the dead letter queue, you might not want to process all the
events in the queue, especially if there are a lot of old events in the queue.
You can start processing events at a specific point in the queue by using the
`start_timestamp` option. This option configures the pipeline to start
processing events based on the timestamp of when they entered the queue:

[source,yaml]
--------------------------------------------------------------------------------
input {
  dead_letter_queue {
    path => "/path/to/data/dead_letter_queue"
    start_timestamp => 2017-06-06T23:40:37
    pipeline_id => "main"
  }
}
--------------------------------------------------------------------------------

For this example, the pipeline starts reading all events that were delivered to
the dead letter queue on or after June 6, 2017, at 23:40:37.

[[dlq-example]]
==== Example: Processing Data That Has Mapping Errors

In this example, the user attempts to index a document that includes geo_ip data,
but the data cannot be processed because it contains a mapping error:

[source,json]
--------------------------------------------------------------------------------
{"geoip":{"location":"home"}}
--------------------------------------------------------------------------------

Indexing fails because the Logstash output plugin expects a `geo_point` object in
the `location` field, but the value is a string. The failed event is written to
the dead letter queue, along with metadata about the error that caused the
failure:

[source,json]
--------------------------------------------------------------------------------
{
   "@metadata" => {
    "dead_letter_queue" => {
       "entry_time" => #<Java::OrgLogstash::Timestamp:0x5b5dacd5>,
        "plugin_id" => "fb80f1925088497215b8d037e622dec5819b503e-4",
      "plugin_type" => "elasticsearch",
           "reason" => "Could not index event to Elasticsearch. status: 400, action: [\"index\", {:_id=>nil, :_index=>\"logstash-2017.06.22\", :_type=>\"logs\", :_routing=>nil}, 2017-06-22T01:29:29.804Z Suyogs-MacBook-Pro-2.local {\"geoip\":{\"location\":\"home\"}}], response: {\"index\"=>{\"_index\"=>\"logstash-2017.06.22\", \"_type\"=>\"logs\", \"_id\"=>\"AVzNayPze1iR9yDdI2MD\", \"status\"=>400, \"error\"=>{\"type\"=>\"mapper_parsing_exception\", \"reason\"=>\"failed to parse\", \"caused_by\"=>{\"type\"=>\"illegal_argument_exception\", \"reason\"=>\"illegal latitude value [266.30859375] for geoip.location\"}}}}"
    }
  },
  "@timestamp" => 2017-06-22T01:29:29.804Z,
    "@version" => "1",
       "geoip" => {
    "location" => "home"
  },
        "host" => "Suyogs-MacBook-Pro-2.local",
     "message" => "{\"geoip\":{\"location\":\"home\"}}"
}
--------------------------------------------------------------------------------

To process the failed event, you create the following pipeline that reads from
the dead letter queue and removes the mapping problem:

[source,json]
--------------------------------------------------------------------------------
input {
  dead_letter_queue {
    path => "/path/to/data/dead_letter_queue/" <1>
  }
}
filter {
  mutate {
    remove_field => "[geoip][location]" <2>
  }
}
output {
  elasticsearch{
    hosts => [ "localhost:9200" ] <3>
  }
}
--------------------------------------------------------------------------------

<1> The <<plugins-inputs-dead_letter_queue,`dead_letter_queue` input>> reads from the dead letter queue.
<2> The `mutate` filter removes the problem field called `location`.
<3> The clean event is sent to Elasticsearch, where it can be indexed because
the mapping issue is resolved.