From f15e42f946f33222c89dd6420d7a40b381ceca23 Mon Sep 17 00:00:00 2001 From: Jordan Sissel Date: Sat, 31 Aug 2013 21:24:14 -0700 Subject: [PATCH] Update to reflect 1.2 --- docs/life-of-an-event.md | 62 +++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 17 deletions(-) diff --git a/docs/life-of-an-event.md b/docs/life-of-an-event.md index 90865019c..f7dd64099 100644 --- a/docs/life-of-an-event.md +++ b/docs/life-of-an-event.md @@ -6,8 +6,11 @@ layout: content_right The logstash agent is an event pipeline. -The logstash agent is 3 parts: inputs -> filters -> outputs. Inputs generate -events, filters modify them, outputs ship them elsewhere. +## The Pipeline + +The logstash agent is a processing pipeline with 3 stages: inputs -> filters -> +outputs. Inputs generate events, filters modify them, outputs ship them +elsewhere. Internal to logstash, events are passed from each phase using internal queues. It is implemented with a 'SizedQueue' in Ruby. SizedQueue allows a bounded @@ -19,12 +22,19 @@ into the next phase - this helps reduce any data loss and in general avoids logstash trying to act as a data storage system. These internal queues are not for storing messages long-term. -Starting at outputs, here's what happens with a queue fills up. +## Fault Tolerance + +Starting at outputs, here's what happens when things break. + +An output can fail or have problems because of some downstream cause, such as +full disk, permissions problems, temporary network failures, or service +outages. Most outputs should keep retrying to ship any events that were +involved in the failure. If an output is failing, the output thread will wait until this output is healthy again and able to successfully send the message. Therefore, the output queue will stop being read from by this output and will eventually fill up with -events and cause write blocks. +events and block new events from being written to this queue. A full output queue means filters will block trying to write to the output queue. Because filters will be stuck, blocked writing to the output queue, they @@ -37,17 +47,19 @@ data from wherever that input is getting new events. In ideal circumstances, this will behave similarly to when the tcp window closes to 0, no new data is sent because the receiver hasn't finished -processing the current queue of data. +processing the current queue of data, but as soon as the downstream (output) +problem is resolved, messages will begin flowing again.. ## Thread Model The thread model in logstash is currently: - input threads | filter threads | output threads + input threads | filter worker threads | output worker -Filters are optional, so you will have this model if you have no filters defined: +Filters are optional, so you will have this model if you have no filters +defined: - input threads | output threads + input threads | output worker Each input runs in a thread by itself. This allows busier inputs to not be blocked by slower ones, etc. It also allows for easier containment of scope @@ -56,21 +68,30 @@ because each input has a thread. The filter thread model is a 'worker' model where each worker receives an event and applies all filters, in order, before emitting that to the output queue. This allows scalability across CPUs because many filters are CPU intensive -(permitting that we have thread safety). Currently, logstash forces the number -of filter worker threads to be 1, but this will be tunable in the future once -we analyze the thread safety of each filter. +(permitting that we have thread safety). -The output thread model one thread per output. Each output has its own queue -receiving events. This is implemented in logstash with LogStash::MultiQueue. +The default number of filter workers is 1, but you can increase this number +with the '-w' flag on the agent. + +The output worker model is currently a single thread. Outputs will receive +events in the order they are defined in the config file. + +Outputs may decide to buffer events temporarily before publishing them, +possibly in a separate thread. One example of this is the elasticsearch output +which will buffer events and flush them all at once, in a separate thread. This +mechanism (buffering many events + writing in a separate thread) can improve +performance so the logstash pipeline isn't stalled waiting for a response from +elasticsearch. ## Consequences and Expectations Small queue sizes mean that logstash simply blocks and stalls safely during -times of load or other temporary pipeline problems. The alternative is -unlimited queues which grow unbounded and eventually exceed memory causing a -crash which loses all of those messages. +times of load or other temporary pipeline problems. There are two alternatives +to this - unlimited queue length and dropping messages. Unlimited queues grow +grow unbounded and eventually exceed memory causing a crash which loses all of +those messages. Dropping messages is also an undesirable behavior in most cases. -At a minum, logstash will have probably 3 threads (2 if you have no filters). +At a minimum, logstash will have probably 3 threads (2 if you have no filters). One input, one filter worker, and one output thread each. If you see logstash using multiple CPUs, this is likely why. If you want to @@ -79,3 +100,10 @@ know more about what each thread is doing, you should read this: Threads in java have names, and you can use jstack and top to figure out who is using what resources. The URL above will help you learn how to do this. + +On Linux platforms, logstash will label all the threads it can with something +descriptive. Inputs will show up as "outputworker" (or something similar). Other threads +may be labeled as well, and are intended to help you identify their purpose +should you wonder why they are consuming resources! +