- add life of an event

This commit is contained in:
Jordan Sissel 2012-01-26 00:07:10 -08:00
parent 663af2aee1
commit 7cc8a2c00d
2 changed files with 26 additions and 22 deletions

View file

@ -10,6 +10,7 @@ layout: content_right
<li> <a href="flags"> command-line flags </a> </li> <li> <a href="flags"> command-line flags </a> </li>
<li> <a href="configuration"> configuration file overview </a> </li> <li> <a href="configuration"> configuration file overview </a> </li>
<li> <a href="extending"> writing your own plugins </a> </li> <li> <a href="extending"> writing your own plugins </a> </li>
<li> <a href="life-of-an-event"> the life of an event in logstash </a> </li>
</ul> </ul>
<h3> use cases and tutorials </h3> <h3> use cases and tutorials </h3>

View file

@ -1,27 +1,30 @@
--- ---
title: Queues and Threads - logstash internals title: the life of an event - logstash
layout: content_right layout: content_right
--- ---
# Queues and Threading (logstash internals) # the life of an event
The logstash agent is 3 parts: inputs -> filters -> outputs. The logstash agent is an event pipeline.
Each '->' is an internal messaging system. It is implemented with a The logstash agent is 3 parts: inputs -> filters -> outputs. Inputs generate
'SizedQueue' in Ruby. SizedQueue allows a bounded maximum of items in the queue events, filters modify them, outputs ship them elsewhere.
such that any writes to the queue will block if the queue is full at maximum
capacity.
Logstash sets the queue size to 20. This means only 20 events can be pending Internal to logstash, events are passed from each phase using internal queues.
It is implemented with a 'SizedQueue' in Ruby. SizedQueue allows a bounded
maximum of items in the queue such that any writes to the queue will block if
the queue is full at maximum capacity.
Logstash sets each queue size to 20. This means only 20 events can be pending
into the next phase - this helps reduce any data loss and in general avoids into the next phase - this helps reduce any data loss and in general avoids
logstash trying to act as a data storage system. These internal queues are not logstash trying to act as a data storage system. These internal queues are not
for storing messages long-term. for storing messages long-term.
In reverse, here's what happens with a queue fills. Starting at outputs, here's what happens with a queue fills up.
If an output is failing, the output thread will wait until this output is If an output is failing, the output thread will wait until this output is
healthy again and able to successfully send the message before moving on. healthy again and able to successfully send the message. Therefore, the output
Therefore, the output queue (there is only one) will stop being read from and queue will stop being read from by this output and will eventually fill up with
will eventually fill up with events and cause write blocks. events and cause write blocks.
A full output queue means filters will block trying to write to the output A full output queue means filters will block trying to write to the output
queue. Because filters will be stuck, blocked writing to the output queue, they queue. Because filters will be stuck, blocked writing to the output queue, they
@ -40,25 +43,25 @@ processing the current queue of data.
The thread model in logstash is currently: The thread model in logstash is currently:
N input threads | M filter threads | 1 output thread input threads | filter threads | output threads
Filters are optional, so you will have this model if you have no filters defined: Filters are optional, so you will have this model if you have no filters defined:
N input threads | 1 output thread input threads | output threads
Each input runs in a thread by itself. This allows busier inputs to not be Each input runs in a thread by itself. This allows busier inputs to not be
blocked by slower ones, etc. It also allows for easier containment of scope blocked by slower ones, etc. It also allows for easier containment of scope
because each input has a thread. because each input has a thread.
The filter thread model is a 'worker' one, where each worker receives an event The filter thread model is a 'worker' model where each worker receives an event
and applies all filters, in order, before emitting that to the output queue. and applies all filters, in order, before emitting that to the output queue.
This allows scalability across CPUs because many filters are CPU intensive This allows scalability across CPUs because many filters are CPU intensive
(permitting that we have thread safety). Currently logstash forces the number (permitting that we have thread safety). Currently, logstash forces the number
of filter worker threads to be 1, but this will be tunable in the future. of filter worker threads to be 1, but this will be tunable in the future once
we analyze the thread safety of each filter.
The output thread model is a single thread. It operates like the worker model The output thread model one thread per output. Each output has its own queue
above where one event is received and all outputs process it in order and receiving events. This is implemented in logstash with LogStash::MultiQueue.
serially.
## Consequences and Expectations ## Consequences and Expectations
@ -67,8 +70,8 @@ times of load or other temporary pipeline problems. The alternative is
unlimited queues which grow unbounded and eventually exceed memory causing a unlimited queues which grow unbounded and eventually exceed memory causing a
crash which loses all of those messages. crash which loses all of those messages.
Given the above, by default, logstash will have probably 3 threads at a minimum At a minum, logstash will have probably 3 threads (2 if you have no filters).
(2 if you have no filters). One input, one filter, and one output thread each. One input, one filter worker, and one output thread each.
If you see logstash using multiple CPUs, this is likely why. If you want to If you see logstash using multiple CPUs, this is likely why. If you want to
know more about what each thread is doing, you should read this: know more about what each thread is doing, you should read this: