adding converted static asciidocs

This commit is contained in:
Kurt Hurtado 2014-09-04 09:44:31 -07:00
parent 73e52b2465
commit a08f60c270
12 changed files with 1517 additions and 0 deletions

View file

@ -0,0 +1,49 @@
= Command-line flags
== Agent
The logstash agent has the following flags (also try using the '--help' flag)
[source,js]
----------------------------------
-f, --config CONFIGFILE
Load the logstash config from a specific file, directory, or a wildcard. If given a directory or wildcard, config files will be read from the directory in alphabetical order.
-e CONFIGSTRING
Use the given string as the configuration data. Same syntax as the config file. If not input is specified, 'stdin { type => stdin }' is default. If no output is specified, 'stdout { debug => true }}' is default.
-w, --filterworkers COUNT
Run COUNT filter workers (default: 1)
--watchdog-timeout TIMEOUT
Set watchdog timeout value in seconds. Default is 10.
-l, --log FILE
Log to a given path. Default is to log to stdout
--verbose
Increase verbosity to the first level, less verbose.
--debug
Increase verbosity to the last level, more verbose.
-v
*DEPRECATED: see --verbose/debug* Increase verbosity. There are multiple levels of verbosity available with
'-vv' currently being the highest
--pluginpath PLUGIN_PATH
A colon-delimted path to find other logstash plugins in
----------------------------------
== Web
[source,js]
----------------------------------
-a, --address ADDRESS
Address on which to start webserver. Default is 0.0.0.0.
-p, --port PORT
Port on which to start webserver. Default is 9292.
----------------------------------

View file

@ -0,0 +1,336 @@
= Logstash Config Language
== Basic Layout
The Logstash config language aims to be simple.
There are 3 main sections: inputs, filters, outputs. Each section has configurations for each plugin available in that section.
Example:
[source,js]
----------------------------------
# This is a comment. You should use comments to describe
# parts of your configuration.
input {
...
}
filter {
...
}
output {
...
}
----------------------------------
== Filters and Ordering
For a given event, are applied in the order of appearance in the configuration file.
== Comments
Comments are the same as in perl, ruby, and python. A comment starts with a '#' character, and does not need to be at the beginning of a line. For example:
[source,js]
----------------------------------
# this is a comment
input { # comments can appear at the end of a line, too
# ...
}
----------------------------------
== Plugins
The input, filter and output sections all let you configure plugins. Plugin
configuration consists of the plugin name followed by a block of settings for
that plugin. For example, how about two file inputs:
[source,js]
----------------------------------
input {
file {
path => "/var/log/messages"
type => "syslog"
}
file {
path => "/var/log/apache/access.log" => "apache"
}
}
----------------------------------
The above configures two file separate inputs. Both set two configuration settings each: 'path' and 'type'. Each plugin has different settings for configuring it; seek the documentation for your plugin to learn what settings are available and what they mean. For example, the [file input][fileinput] documentation will explain the meanings of the path and type settings.
[fileinput]: inputs/file
== Value Types
The documentation for a plugin may enforce a configuration field having a
certain type. Examples include boolean, string, array, number, hash,
etc.
=== Boolean
A boolean must be either `true` or `false`. Note the lack of quotes around `true` and `false`.
Examples:
[source,js]
----------------------------------
debug => true
----------------------------------
=== String
A string must be a single value.
Example:
[source,js]
----------------------------------
name => "Hello world"
----------------------------------
Single, unquoted words are valid as strings, too, but you should use quotes.
=== Number
Numbers must be valid numerics (floating point or integer are OK).
Example:
[source,js]
----------------------------------
port => 33
----------------------------------
=== Array
An array can be a single string value or multiple. If you specify the same
field multiple times, it appends to the array.
Examples:
[source,js]
----------------------------------
path => [ "/var/log/messages", "/var/log/*.log" ]
path => "/data/mysql/mysql.log"
----------------------------------
The above makes 'path' a 3-element array including all 3 strings.
=== Hash
A hash is basically the same syntax as Ruby hashes.
The key and value are simply pairs, such as:
[source,js]
----------------------------------
match => {
"field1" => "value1"
"field2" => "value2"
...
}
----------------------------------
== Field References
All events have properties. For example, an apache access log would have things
like status code (200, 404), request path ("/", "index.html"), HTTP verb (GET, POST),
client IP address, etc. Logstash calls these properties "fields."
In many cases, it is useful to be able to refer to a field by name. To do this,
you can use the Logstash field reference syntax.
By way of example, let us suppose we have this event:
[source,js]
----------------------------------
{
"agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
"ip": "192.168.24.44",
"request": "/index.html"
"response": {
"status": 200,
"bytes": 52353
},
"ua": {
"os": "Windows 7"
}
}
----------------------------------
- the syntax to access fields is `[fieldname]`.
- if you are only referring to a **top-level field**, you can omit the `[]` and
simply say `fieldname`.
- in the case of **nested fields**, like the "os" field above, you need
the full path to that field: `[ua][os]`.
=== sprintf format
This syntax is also used in what Logstash calls 'sprintf format'. This format
allows you to refer to field values from within other strings. For example, the
statsd output has an 'increment' setting, to allow you to keep a count of
apache logs by status code:
[source,js]
----------------------------------
output {
statsd {
increment => "apache.%{[response][status]}"
}
}
----------------------------------
You can also do time formatting in this sprintf format. Instead of specifying a field name, use the `+FORMAT` syntax where `FORMAT` is a [time format](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html).
For example, if you want to use the file output to write to logs based on the
hour and the 'type' field:
[source,js]
----------------------------------
output {
file {
path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
}
}
----------------------------------
== Conditionals
Sometimes you only want a filter or output to process an event under
certain conditions. For that, you'll want to use a conditional!
Conditionals in Logstash look and act the same way they do in programming
languages. You have `if`, `else if` and `else` statements. Conditionals may be
nested if you need that.
The syntax is follows:
[source,js]
----------------------------------
if EXPRESSION {
...
} else if EXPRESSION {
...
} else {
...
}
----------------------------------
What's an expression? Comparison tests, boolean logic, etc!
The following comparison operators are supported:
* equality, etc: ==, !=, <, >, <=, >=
* regexp: =~, !~
* inclusion: in, not in
The following boolean operators are supported:
* and, or, nand, xor
The following unary operators are supported:
* !
Expressions may contain expressions. Expressions may be negated with `!`.
Expressions may be grouped with parentheses `(...)`. Expressions can be long
and complex.
For example, if we want to remove the field `secret` if the field
`action` has a value of `login`:
[source,js]
----------------------------------
filter {
if [action] == "login" {
mutate { remove => "secret" }
}
}
----------------------------------
The above uses the field reference syntax to get the value of the
`action` field. It is compared against the text `login` and, if equal,
allows the mutate filter to delete the field named `secret`.
How about a more complex example?
* alert nagios of any apache events with status 5xx
* record any 4xx status to elasticsearch
* record all status code hits via statsd
How about telling nagios of any http event that has a status code of 5xx?
[source,js]
----------------------------------
output {
if [type] == "apache" {
if [status] =~ /^5\d\d/ {
nagios { ... }
} else if [status] =~ /^4\d\d/ {
elasticsearch { ... }
}
statsd { increment => "apache.%{status}" }
}
}
----------------------------------
You can also do multiple expressions in a single condition:
[source,js]
----------------------------------
output {
# Send production errors to pagerduty
if [loglevel] == "ERROR" and [deployment] == "production" {
pagerduty {
...
}
}
}
----------------------------------
Here are some examples for testing with the in conditional:
[source,js]
----------------------------------
filter {
if [foo] in [foobar] {
mutate { add_tag => "field in field" }
}
if [foo] in "foo" {
mutate { add_tag => "field in string" }
}
if "hello" in [greeting] {
mutate { add_tag => "string in field" }
}
if [foo] in ["hello", "world", "foo"] {
mutate { add_tag => "field in list" }
}
if [missing] in [alsomissing] {
mutate { add_tag => "shouldnotexist" }
}
if !("foo" in ["hello", "world"]) {
mutate { add_tag => "shouldexist" }
}
}
----------------------------------
Or, to test if grok was successful:
[source,js]
----------------------------------
output {
if "_grokparsefailure" not in [tags] {
elasticsearch { ... }
}
}
----------------------------------
== Further Reading
For more information, see [the plugin docs index](index)

View file

@ -0,0 +1,56 @@
= contrib plugins
== Why contrib?
As Logstash has grown, we've accumulated a massive repository of plugins. Well over 100 plugins, it became difficult for the project maintainers to adequately support everything effectively.
In order to improve the quality of popular plugins, we've moved the less-commonly-used plugins to a separate repository we're calling "contrib". Concentrating common plugin usage into core solves a few problems, most notably user complaints about the size of Logstash releases, support/maintenance costs, etc.
It is our intent that this separation will improve life for users. If it doesn't, please file a bug so we can work to address it!
If a plugin is available in the 'contrib' package, the documentation for that plugin will note this boldly at the top of that plugin's documentation.
Contrib plugins reside in a [separate github project](https://github.com/elasticsearch/logstash-contrib).
== Packaging
At present, the contrib modules are available as a tarball.
== Automated Installation
The `bin/plugin` script will handle the installation for you:
[source,js]
----------------------------------
cd /path/to/logstash
bin/plugin install contrib
----------------------------------
== Manual Installation
The contrib plugins can be extracted on top of an existing Logstash installation.
For example, if I've extracted `logstash-%VERSION%.tar.gz` into `/path`, e.g.
[source,js]
----------------------------------
cd /path
tar zxf ~/logstash-%VERSION%.tar.gz
----------------------------------
It will have a `/path/logstash-%VERSION%` directory, e.g.
[source,js]
----------------------------------
$ ls
logstash-%VERSION%
----------------------------------
The method to install the contrib tarball is identical.
[source,js]
----------------------------------
cd /path
wget http://download.elasticsearch.org/logstash/logstash/logstash-contrib-%VERSION%.tar.gz
tar zxf ~/logstash-contrib-%VERSION%.tar.gz
----------------------------------
This will install the contrib plugins in the same directory as the core
install. These plugins will be available to Logstash the next time it starts.

View file

@ -0,0 +1,114 @@
[[contributing-to-logstash]]
== Extending logstash
You can add your own input, output, or filter plugins to logstash.
If you're looking to extend logstash today, please look at the existing plugins.
[float]
=== Good examples of plugins
* [inputs/tcp](https://github.com/logstash/logstash/blob/master/lib/logstash/inputs/tcp.rb)
* [filters/multiline](https://github.com/logstash/logstash/blob/master/lib/logstash/filters/multiline.rb)
* [outputs/mongodb](https://github.com/logstash/logstash/blob/master/lib/logstash/outputs/mongodb.rb)
[float]
=== Common concepts
* The `config_name` sets the name used in the config file.
* The `milestone` sets the milestone number of the plugin. See <../plugin-milestones> for more info.
* The `config` lines define config options.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
[float]
==== Required modules
All plugins should require the Logstash module.
[source,js]
----------------------------------
require 'logstash/namespace'
----------------------------------
[float]
==== Plugin name
Every plugin must have a name set with the `config_name` method. If this
is not specified plugins will fail to load with an error.
[float]
==== Milestones
Every plugin needs a milestone set using `milestone`. See
<../plugin-milestones> for more info.
[float]
==== Config lines
The `config` lines define configuration options and are constructed like
so:
[source,js]
----------------------------------
config :host, :validate => :string, :default => "0.0.0.0"
----------------------------------
The name of the option is specified, here `:host` and then the
attributes of the option. They can include `:validate`, `:default`,
`:required` (a Boolean `true` or `false`), and `:deprecated` (also a
Boolean).
[float]
=== Inputs
All inputs require the LogStash::Inputs::Base class:
[source,js]
----------------------------------
require 'logstash/inputs/base'
----------------------------------
Inputs have two methods: `register` and `run`.
* Each input runs as its own thread.
* The `run` method is expected to run-forever.
[float]
=== Filters
All filters require the LogStash::Filters::Base class:
[source,js]
----------------------------------
require 'logstash/filters/base'
----------------------------------
Filters have two methods: `register` and `filter`.
* The `filter` method gets an event.
* Call `event.cancel` to drop the event.
* To modify an event, simply make changes to the event you are given.
* The return value is ignored.
[float]
=== Outputs
All outputs require the LogStash::Outputs::Base class:
[source,js]
----------------------------------
require 'logstash/outputs/base'
----------------------------------
Outputs have two methods: `register` and `receive`.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
* The `receive` method is called when an event gets pushed to your output
[float]
=== Example: a new filter
Learn by example how to [add a new filter to logstash](example-add-a-new-filter)

View file

@ -0,0 +1,119 @@
= Add a new filter
== Adding a sample filter to Logstash
This document shows you how to add a new filter to Logstash.
For a general overview of how to add a new plugin, see [the extending Logstash](.) overview.
== Write code.
Let's write a 'hello world' filter. This filter will replace the 'message' in the event with "Hello world!"
First, Logstash expects plugins in a certain directory structure: `logstash/TYPE/PLUGIN_NAME.rb`
Since we're creating a filter, let's mkdir this:
[source,js]
----------------------------------
mkdir -p logstash/filters/
cd logstash/filters
----------------------------------
Now add the code:
[source,js]
----------------------------------
# Call this file 'foo.rb' (in logstash/filters, as above)
require "logstash/filters/base"
require "logstash/namespace"
class LogStash::Filters::Foo < LogStash::Filters::Base
# Setting the config_name here is required. This is how you
# configure this filter from your Logstash config.
#
# filter {
# foo { ... }
# }
config_name "foo"
# New plugins should start life at milestone 1.
milestone 1
# Replace the message with this value.
config :message, :validate => :string
public
def register
# nothing to do end # def register
public
def filter(event)
# return nothing unless there's an actual filter event
return unless filter?(event)
if @message
# Replace the event message with our message as configured in the
# config file.
event["message"] = @message
end
# filter_matched should go in the last line of our successful code
filter_matched(event)
end # def filter
end # class LogStash::Filters::Foo
## Add it to your configuration
----------------------------------
For this simple example, let's just use stdin input and stdout output.
The config file looks like this:
[source,js]
----------------------------------
input {
stdin { type => "foo" }
}
filter {
if [type] == "foo" {
foo {
message => "Hello world!"
}
}
}
output {
stdout { }
}
----------------------------------
Call this file 'example.conf'
== Tell Logstash about it.
Depending on how you installed Logstash, you have a few ways of including this
plugin.
You can use the agent flag --pluginpath flag to specify where the root of your
plugin tree is. In our case, it's the current directory.
[source,js]
----------------------------------
% bin/logstash --pluginpath your/plugin/root -f example.conf
----------------------------------
## Example running
In the example below, I typed in "the quick brown fox" after running the java
command.
[source,js]
----------------------------------
% bin/logstash -f example.conf
the quick brown fox
2011-05-12T01:05:09.495000Z stdin://snack.home/: Hello world!
----------------------------------
The output is the standard Logstash stdout output, but in this case our "the quick brown fox" message was replaced with "Hello world!"
All done! :)

View file

@ -0,0 +1,104 @@
= Extending Logstash
== Extending Logstash
You can add your own input, output, or filter plugins to Logstash.
If you're looking to extend Logstash today, please look at the existing plugins.
== Good examples of plugins
* https://github.com/logstash/logstash/blob/master/lib/logstash/inputs/tcp.rb[inputs/tcp]
* https://github.com/logstash/logstash/blob/master/lib/logstash/filters/multiline.rb[filters/multiline]
* https://github.com/logstash/logstash/blob/master/lib/logstash/outputs/mongodb.rb[outputs/mongodb]
== Common concepts
* The `config_name` sets the name used in the config file.
* The `milestone` sets the milestone number of the plugin. See <../plugin-milestones> for more info.
* The `config` lines define config options.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
=== Required modules
All plugins should require the Logstash module.
[source,js]
----------------------------------
require 'logstash/namespace'
----------------------------------
=== Plugin name
Every plugin must have a name set with the `config_name` method. If this
is not specified plugins will fail to load with an error.
=== Milestones
Every plugin needs a milestone set using `milestone`. See
<../plugin-milestones> for more info.
=== Config lines
The `config` lines define configuration options and are constructed like
so:
[source,js]
----------------------------------
config :host, :validate => :string, :default => "0.0.0.0"
----------------------------------
The name of the option is specified, here `:host` and then the
attributes of the option. They can include `:validate`, `:default`,
`:required` (a Boolean `true` or `false`), and `:deprecated` (also a
Boolean).
== Inputs
All inputs require the LogStash::Inputs::Base class:
[source,js]
----------------------------------
require 'logstash/inputs/base'
----------------------------------
Inputs have two methods: `register` and `run`.
* Each input runs as its own thread.
* The `run` method is expected to run-forever.
== Filters
All filters require the LogStash::Filters::Base class:
[source,js]
----------------------------------
require 'logstash/filters/base'
----------------------------------
Filters have two methods: `register` and `filter`.
* The `filter` method gets an event.
* Call `event.cancel` to drop the event.
* To modify an event, simply make changes to the event you are given.
* The return value is ignored.
== Outputs
All outputs require the LogStash::Outputs::Base class:
[source,js]
----------------------------------
require 'logstash/outputs/base'
----------------------------------
Outputs have two methods: `register` and `receive`.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
* The `receive` method is called when an event gets pushed to your output
== Example: a new filter
Learn by example how to http://foo.com/example-add-a-new-filter[add a new filter to Logstash]

View file

@ -0,0 +1,472 @@
= Getting Started with Logstash
== Introduction
Logstash is a tool for receiving, processing and outputting logs. All kinds of logs. System logs, webserver logs, error logs, application logs and just about anything you can throw at it. Sounds great, eh?
Using Elasticsearch as a backend datastore, and kibana as a frontend reporting tool, Logstash acts as the workhorse, creating a powerful pipeline for storing, querying and analyzing your logs. With an arsenal of built-in inputs, filters, codecs and outputs, you can harness some powerful functionality with a small amount of effort. So, let's get started!
=== Prerequisite: Java
The only prerequisite required by Logstash is a Java runtime. You can check that you have it installed by running the command `java -version` in your shell. Here's something similar to what you might see:
[source,js]
----------------------------------
> java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
----------------------------------
It is recommended to run a recent version of Java in order to ensure the greatest success in running Logstash.
It's fine to run an open-source version such as OpenJDK: +
http://openjdk.java.net/
Or you can use the official Oracle version: +
http://www.oracle.com/technetwork/java/index.html
Once you have verified the existence of Java on your system, we can move on!
== Up and Running!
=== Logstash in two commands
First, we're going to download the 'logstash' binary and run it with a very simple configuration.
[source,js]
----------------------------------
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
----------------------------------
Now you should have the file named 'logstash-%VERSION%.tar.gz' on your local filesystem. Let's unpack it:
[source,js]
----------------------------------
tar zxvf logstash-%VERSION%.tar.gz
cd logstash-%VERSION%
----------------------------------
Now let's run it:
[source,js]
----------------------------------
bin/logstash -e 'input { stdin { } } output { stdout {} }'
----------------------------------
Now type something into your command prompt, and you will see it output by Logstash:
[source,js]
----------------------------------
hello world
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
----------------------------------
OK, that's interesting... We ran Logstash with an input called "stdin", and an output named "stdout", and Logstash basically echoed back whatever we typed in some sort of structured format. Note that specifying the *-e* command line flag allows Logstash to accept a configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file between iterations.
Let's try a slightly fancier example. First, you should exit Logstash by issuing a 'CTRL-C' command in the shell in which it is running. Now run Logstash again with the following command:
[source,js]
----------------------------------
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
----------------------------------
And then try another test input, typing the text "goodnight moon":
[source,js]
----------------------------------
goodnight moon
{
"message" => "goodnight moon",
"@timestamp" => "2013-11-20T23:48:05.335Z",
"@version" => "1",
"host" => "my-laptop"
}
----------------------------------
So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of Logstash. By adding inputs, outputs and filters to your configuration, it's possible to massage the log data in many ways, in order to maximize flexibility of the stored data when you are querying it.
== Storing logs with Elasticsearch
Now, you're probably saying, "that's all fine and dandy, but typing all my logs into Logstash isn't really an option, and merely seeing them spit to STDOUT isn't very useful." Good point. First, let's set up Elasticsearch to store the messages we send into Logstash. If you don't have Elasticearch already installed, you can http://www.elasticsearch.org/download/[download the RPM or DEB package], or install manually by downloading the current release tarball, by issuing the following four commands:
[source,js]
----------------------------------
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
tar zxvf elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
cd elasticsearch-%ELASTICSEARCH_VERSION%/
./bin/elasticsearch
----------------------------------
NOTE: This tutorial specifies running Logstash %VERSION% with Elasticsearch %ELASTICSEARCH_VERSION%. Each release of Logstash has a *recommended* version of Elasticsearch to pair with. Make sure the versions match based on the http://logstash.net/docs/latest[Logstash version] you're running!
More detailed information on installing and configuring Elasticsearch can be found on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html[The Elasticsearch reference pages]. However, for the purposes of Getting Started with Logstash, the default installation and configuration of Elasticsearch should be sufficient.
Now that we have Elasticsearch running on port 9200 (we do, right?), Logstash can be simply configured to use Elasticsearch as its backend. The defaults for both Logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional configurations within the elasticsearch output:
[source,js]
----------------------------------
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
----------------------------------
Type something, and Logstash will process it as before (this time you won't see any output, since we don't have the stdout output configured)
[source,js]
----------------------------------
you know, for logs
----------------------------------
You can confirm that ES actually received the data by making a curl request and inspecting the return:
[source,js]
----------------------------------
curl 'http://localhost:9200/_search?pretty'
----------------------------------
which should return something like this:
[source,js]
----------------------------------
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.11.21",
"_type" : "logs",
"_id" : "2ijaoKqARqGvbMgP3BspJA",
"_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"}
} ]
}
}
----------------------------------
Congratulations! You've successfully stashed logs in Elasticsearch via Logstash.
=== Elasticsearch Plugins (an aside)
Another very useful tool for querying your Logstash data (and Elasticsearch in general) is the Elasticearch-kopf plugin. Here is more information on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html[Elasticsearch plugins]. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch directory (the same one in which you ran Elasticsearch earlier):
[source,js]
----------------------------------
bin/plugin -install lmenezes/elasticsearch-kopf
----------------------------------
Now you can browse to http://localhost:9200/_plugin/kopf[http://localhost:9200/_plugin/kopf] to browse your Elasticsearch data, settings and mappings!
=== Multiple Outputs
As a quick exercise in configuring multiple Logstash outputs, let's invoke Logstash again, using both the 'stdout' as well as the 'elasticsearch' output:
[source,js]
----------------------------------
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
----------------------------------
Typing a phrase will now echo back to your terminal, as well as save in Elasticsearch! (Feel free to verify this using curl or elasticsearch-kopf).
=== Default - Daily Indices
You might notice that Logstash was smart enough to create a new index in Elasticsearch... The default index name is in the form of 'logstash-YYYY.MM.DD', which essentially creates one index per day. At midnight (GMT?), Logstash will automagically rotate the index to a fresh new one, with the new current day's timestamp. This allows you to keep windows of data, based on how far retroactively you'd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able to query further into the past. If you'd like to simply delete old indices after a certain time period, you can use the https://github.com/elasticsearch/curator[Elasticsearch Curator tool].
== Moving On
Now you're ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features of Logstash, and how they interact with the Logstash engine.
=== The Life of an Event
Inputs, Outputs, Codecs and Filters are at the heart of the Logstash configuration. By creating a pipeline of event processing, Logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your data. To get you thinking about the various options available in Logstash, let's discuss some of the more common configurations currently in use. For more details, read about http://logstash.net/docs/latest/life-of-an-event[the Logstash event pipeline].
==== Inputs
Inputs are the mechanism for passing log data to Logstash. Some of the more useful, commonly-used ones are:
* *file*: reads from a file on the filesystem, much like the UNIX command "tail -0a"
* *syslog*: listens on the well-known port 514 for syslog messages and parses according to RFC3164 format
* *redis*: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized Logstash installation, which queues Logstash events from remote Logstash "shippers".
* *lumberjack*: processes events sent in the lumberjack protocol. Now called https://github.com/elasticsearch/logstash-forwarder[logstash-forwarder].
==== Filters
Filters are used as intermediary processing devices in the Logstash chain. They are often combined with conditionals in order to perform a certain action on an event, if it matches particular criteria. Some useful filters:
* *grok*: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to Logstash, it's more than likely you'll find one that meets your needs!
* *mutate*: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your events.
* *drop*: drop an event completely, for example, 'debug' events.
* *clone*: make a copy of an event, possibly adding or removing fields.
* *geoip*: adds information about geographical location of IP addresses (and displays amazing charts in kibana)
==== Outputs
Outputs are the final phase of the Logstash pipeline. An event may pass through multiple outputs during processing, but once all outputs are complete, the event has finished its execution. Some commonly used outputs include:
* *elasticsearch*: If you're planning to save your data in an efficient, convenient and easily queryable format... Elasticsearch is the way to go. Period. Yes, we're biased :)
* *file*: writes event data to a file on disk.
* *graphite*: sends event data to graphite, a popular open source tool for storing and graphing metrics. http://graphite.wikidot.com/
* *statsd*: a service which "listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services". If you're already using statsd, this could be useful for you!
==== Codecs
Codecs are basically stream filters which can operate as part of an input, or an output. Codecs allow you to easily separate the transport of your messages from the serialization process. Popular codecs include 'json', 'msgpack' and 'plain' (text).
* *json*: encode / decode data in JSON format
* *multiline*: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages
For the complete list of (current) configurations, visit the Logstash "plugin configuration" section of the http://logstash.net/docs/latest/[Logstash documentation page].
== More fun with Logstash
=== Persistent Configuration files
Specifying configurations on the command line using '-e' is only so helpful, and more advanced setups will require more lengthy, long-lived configurations. First, let's create a simple configuration file, and invoke Logstash using it. Create a file named "logstash-simple.conf" and save it in the same directory as Logstash.
[source,js]
----------------------------------
input { stdin { } }
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----------------------------------
Then, run this command:
[source,js]
----------------------------------
bin/logstash -f logstash-simple.conf
----------------------------------
Et voilà! Logstash will read in the configuration file you just created and run as in the example we saw earlier. Note that we used the '-f' to read in the file, rather than the '-e' to read the configuration from the command line. This is a very simple case, of course, so let's move on to some more complex examples.
=== Filters
Filters are an in-line processing mechanism which provide the flexibility to slice and dice your data to fit your needs. Let's see one in action, namely the *grok filter*.
[source,js]
----------------------------------
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----------------------------------
Run Logstash with this configuration:
[source,js]
----------------------------------
bin/logstash -f logstash-filter.conf
----------------------------------
Now paste this line into the terminal (so it will be processed by the stdin input):
[source,js]
----------------------------------
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
----------------------------------
You should see something returned to STDOUT which looks like this:
[source,js]
----------------------------------
{
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
"@timestamp" => "2013-12-11T08:01:45.000Z",
"@version" => "1",
"host" => "cadenza",
"clientip" => "127.0.0.1",
"ident" => "-",
"auth" => "-",
"timestamp" => "11/Dec/2013:00:01:45 -0800",
"verb" => "GET",
"request" => "/xampp/status.php",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "3891",
"referrer" => "\"http://cadenza/xampp/navi.php\"",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
}
----------------------------------
As you can see, Logstash (with help from the *grok* filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying and analyzing our log data... for example, we'll be able to run reports on HTTP response codes, IP addresses, referrers, etc. very easily. There are quite a few grok patterns included with Logstash out-of-the-box, so it's quite likely if you're attempting to parse a fairly common log format, someone has already done the work for you. For more details, see the list of https://github.com/logstash/logstash/blob/master/patterns/grok-patterns[logstash grok patterns] on github.
The other filter used in this example is the *date* filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the @timestamp field in this example is set to December 11, 2013, even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example... the ability to tell Logstash "use this value as the timestamp for this event".
== Useful Examples
=== Apache logs (from files)
Now, let's configure something actually *useful*... apache2 access log files! We are going to read the input from a file on the localhost, and use a *conditional* to process the event according to our needs. First, create a file called something like 'logstash-apache.conf' with the following contents (you'll need to change the log's file path to suit your needs):
[source,js]
----------------------------------
input {
file {
path => "/tmp/access_log"
start_position => beginning
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
}
stdout { codec => rubydebug }
}
----------------------------------
Then, create the file you configured above (in this example, "/tmp/access_log") with the following log lines as contents (or use some from your own webserver):
[source,js]
----------------------------------
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"
98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
----------------------------------
Now run it with the -f flag as in the last example:
[source,js]
----------------------------------
bin/logstash -f logstash-apache.conf
----------------------------------
You should be able to see your apache log data in Elasticsearch now! You'll notice that Logstash opened the file you configured, and read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by Logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).
In this configuration, Logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching '*log'), by changing one line in the above configuration, like this:
[source,js]
----------------------------------
input {
file {
path => "/tmp/*_log"
...
----------------------------------
Now, rerun Logstash, and you will see both the error and access logs processed via Logstash. However, if you inspect your data (using elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. That's because we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can...
Also, you might have noticed that Logstash did not reprocess the events which were already seen in the access_log file. Logstash is able to save its position in files, only processing new lines as they are added to the file. Neat!
=== Conditionals
Now we can build on the previous example, where we introduced the concept of a *conditional*. A conditional should be familiar to most Logstash users, in the general sense. You may use 'if', 'else if' and 'else' statements, as in many other programming languages. Let's label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").
[source,js]
----------------------------------
input {
file {
path => "/tmp/*_log"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----------------------------------
You'll notice we've labeled all events using the "type" field, but we didn't actually parse the "error" or "random" files... There are so many types of error logs that it's better left as an exercise for you, depending on the logs you're seeing.
=== Syslog
OK, now we can move on to another incredibly useful example: *syslog*. Syslog is one of the most common use cases for Logstash, and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164 :). Syslog is the de facto UNIX networked logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example, you won't need a functioning syslog instance; we'll fake it from the command line, so you can get a feel for what happens.
First, let's make a simple configuration file for Logstash + syslog, called 'logstash-syslog.conf'.
[source,js]
----------------------------------
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----------------------------------
Run it as normal:
[source,js]
----------------------------------
bin/logstash -f logstash-syslog.conf
----------------------------------
Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. In this simplified case, we're simply going to telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell window to interact with the Logstash syslog input and type the following command:
[source,js]
----------------------------------
telnet localhost 5000
----------------------------------
You can copy and paste the following lines as samples (feel free to try some of your own, but keep in mind they might not parse if the grok filter is not correct for your data):
[source,js]
----------------------------------
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]
Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' denied
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)
Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
----------------------------------
Now you should see the output of Logstash in your original shell as it processes and parses messages!
[source,js]
----------------------------------
{
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"@timestamp" => "2013-12-23T22:30:01.000Z",
"@version" => "1",
"type" => "syslog",
"host" => "0:0:0:0:0:0:0:1:52617",
"syslog_timestamp" => "Dec 23 14:30:01",
"syslog_hostname" => "louis",
"syslog_program" => "CRON",
"syslog_pid" => "619",
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"received_at" => "2013-12-23 22:49:22 UTC",
"received_from" => "0:0:0:0:0:0:0:1:52617",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}
----------------------------------
Congratulations! You're well on your way to being a real Logstash power user. You should be comfortable configuring, running and sending events to Logstash, but there's much more to explore.

View file

@ -0,0 +1,16 @@
[[howtos-and-tutorials]]
== Logstash HOWTOs and Tutorials
Pretty self-explanatory, really
=== Downloads and Releases
* http://elasticsearch.org/#[Getting Started with Logstash]
* http://elasticsearch.org/#[Configuration file overview]
* http://elasticsearch.org/#[Command-line flags]
* http://elasticsearch.org/#[The life of an event in logstash]
* http://elasticsearch.org/#[Using conditional logic]
* http://elasticsearch.org/#[Glossary]
* http://elasticsearch.org/#[referring to fields `[like][this]`]
* http://elasticsearch.org/#[using the `%{fieldname}` syntax]
* http://elasticsearch.org/#[Metrics from Logs]
* http://elasticsearch.org/#[Using RabbitMQ]
* http://elasticsearch.org/#[Contributing to Logstash]

View file

@ -0,0 +1,65 @@
= the life of an event
The Logstash agent is an event pipeline.
== The Pipeline
The Logstash agent is a processing pipeline with 3 stages: inputs -> filters -> outputs. Inputs generate events, filters modify them, outputs ship them elsewhere.
Internal to Logstash, events are passed from each phase using internal queues. It is implemented with a 'SizedQueue' in Ruby. SizedQueue allows a bounded maximum of items in the queue such that any writes to the queue will block if the queue is full at maximum capacity.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase - this helps reduce any data loss and in general avoids Logstash trying to act as a data storage system. These internal queues are not for storing messages long-term.
== Fault Tolerance
Starting at outputs, here's what happens when things break.
An output can fail or have problems because of some downstream cause, such as full disk, permissions problems, temporary network failures, or service outages. Most outputs should keep retrying to ship any events that were involved in the failure.
If an output is failing, the output thread will wait until this output is healthy again and able to successfully send the message. Therefore, the output queue will stop being read from by this output and will eventually fill up with events and block new events from being written to this queue.
A full output queue means filters will block trying to write to the output queue. Because filters will be stuck, blocked writing to the output queue, they will stop reading from the filter queue which will eventually cause the filter queue (input -> filter) to fill up.
A full filter queue will cause inputs to block when writing to the filters. This will cause each input to block, causing each input to stop processing new data from wherever that input is getting new events.
In ideal circumstances, this will behave similarly to when the tcp window closes to 0, no new data is sent because the receiver hasn't finished processing the current queue of data, but as soon as the downstream (output) problem is resolved, messages will begin flowing again..
== Thread Model
The thread model in Logstash is currently:
[source,js]
----------------------------------
input threads | filter worker threads | output worker
----------------------------------
Filters are optional, so you will have this model if you have no filters defined:
[source,js]
----------------------------------
input threads | output worker
----------------------------------
Each input runs in a thread by itself. This allows busier inputs to not be blocked by slower ones, etc. It also allows for easier containment of scope because each input has a thread.
The filter thread model is a 'worker' model where each worker receives an event and applies all filters, in order, before emitting that to the output queue. This allows scalability across CPUs because many filters are CPU intensive (permitting that we have thread safety).
The default number of filter workers is 1, but you can increase this number with the '-w' flag on the agent.
The output worker model is currently a single thread. Outputs will receive events in the order they are defined in the config file.
Outputs may decide to buffer events temporarily before publishing them, possibly in a separate thread. One example of this is the elasticsearch output
which will buffer events and flush them all at once, in a separate thread. This mechanism (buffering many events + writing in a separate thread) can improve performance so the Logstash pipeline isn't stalled waiting for a response from elasticsearch.
== Consequences and Expectations
Small queue sizes mean that Logstash simply blocks and stalls safely during times of load or other temporary pipeline problems. There are two alternatives to this - unlimited queue length and dropping messages. Unlimited queues grow grow unbounded and eventually exceed memory causing a crash which loses all of those messages. Dropping messages is also an undesirable behavior in most cases.
At a minimum, Logstash will have probably 3 threads (2 if you have no filters). One input, one filter worker, and one output thread each.
If you see Logstash using multiple CPUs, this is likely why. If you want to know more about what each thread is doing, you should read this: <http://www.semicomplete.com/blog/geekery/debugging-java-performance.html>.
Threads in java have names, and you can use jstack and top to figure out who is using what resources. The URL above will help you learn how to do this.
On Linux platforms, Logstash will label all the threads it can with something descriptive. Inputs will show up as "<inputname" and filter workers as "|worker" and outputs as ">outputworker" (or something similar). Other threads may be labeled as well, and are intended to help you identify their purpose should you wonder why they are consuming resources!

View file

@ -0,0 +1,30 @@
[[logstash-docs-home]]
== Logstash Documentation
Pretty self-explanatory, really
=== Downloads and Releases
* http://www.elasticsearch.org/overview/logstash/download/[Download logstash 1.4.2]
* http://www.elasticsearch.org/blog/apt-and-yum-repositories/[package repositories]
* http://www.elasticsearch.org/blog/logstash-1-4-2/[release notes]
* https://github.com/elasticsearch/logstash/blob/master/CHANGELOG[view changelog]
* https://github.com/elasticsearch/puppet-logstash[Puppet Module]
=== Plugins
* http://elasticsearch.org/#[contrib plugins]
* http://elasticsearch.org/#[writing your own plugins]
* http://elasticsearch.org/#[Inputs] / http://elasticsearch.org/#[Filters] / http://elasticsearch.org/#[Outputs]
* http://elasticsearch.org/#[Codecs]
* http://elasticsearch.org/#[(more)]
=== HOWTOs, References, Information
* http://elasticsearch.org/#[Getting Started with Logstash]
* http://elasticsearch.org/#[Configuration file overview]
* http://elasticsearch.org/#[Command-line flags]
* http://elasticsearch.org/#[The life of an event in logstash]
* http://elasticsearch.org/#[Using conditional logic]
* http://elasticsearch.org/#[Glossary]
* http://elasticsearch.org/#[(more)]
=== About / Videos / Blogs
* http://elasticsearch.org/#[Videos]
* http://elasticsearch.org/#[Blogs]

View file

@ -0,0 +1,132 @@
== bar
bar bar
apache ::
A very common open source web server application, which produces logs easily consumed by Logstash (Apache Common/Combined Log Format).
agent ::
An invocation of Logstash with a particular configuration, allowing it to operate as a "shipper", a "collector", or a combination of functionalities.
broker ::
An intermediary used in a multi-tiered Logstash deployment which allows a queueing mechanism to be used. Examples of brokers are Redis, RabbitMQ, and Apache Kafka. This pattern is a common method of building fault-tolerance into a Logstash architecture.
buffer::
Within Logstash, a temporary storage area where events can queue up, waiting to be processed. The default queue size is 20 events, but it is not recommended to increase this, as Logstash is not designed to operate as a queueing mechanism.
centralized::
A configuration of Logstash in which the Logstash agent, input and output sources live on multiple machines, and the pipeline passes through these tiers.
codec::
A Logstash plugin which works within an input or output plugin, and usually aims to serialize or deserialize data flowing through the Logstash pipeline. A common example is the JSON codec, which allows Logstash inputs to receive data which arrives in JSON format, or output event data in JSON format.
collector::
An instance of Logstash which receives external events from another instance of Logstash, or perhaps some other client, either remote or local.
conditional::
In a computer programming context, a control flow which executes certain actions based on true/false values of a statement (called the condition). Often expressed in the form of "if ... then ... (elseif ...) else". Logstash has built-in conditionals to allow users control of the plugin pipeline.
elasticsearch::
An open-source, Lucene-based, RESTful search and analytics engine written in Java, with supported clients in various languages such as Perl, Python, Ruby, Java, etc.
event::
In Logstash parlance, a single unit of information, containing a timestamp plus additional data. An event arrives via an input, and is subsequently parsed, timestamped, and passed through the Logstash pipeline.
field::
A data point (often a key-value pair) within a full Logstash event, e.g. "timestamp", "message", "hostname", "ipaddress". Also used to describe a key-value pair in Elasticsearch.
file::
A resource storing binary data (which might be text, image, application, etc.) on a physical storage media. In the Logstash context, a common input source which monitors a growing collection of text-based log lines.
filter:
An intermediary processing mechanism in the Lostash pipeline. Typically, filters act upon event data after it has been ingested via inputs, by mutating, enriching, and/or modifying the data according to configuration rules. The second phase of the typical Logstash pipeline (inputs->filters->outputs).
fluentd::
Like Logstash, another open-source tool for collecting logs and events, with plugins to extend functionality.
ganglia::
A scalable, distributed monitoring system suitable for large clusters. Logstash features both an input and an output to enable reading from, and writing to Ganglia.
graphite::
A highly-scalable realtime graphing application, which presents graphs through web interfaces. Logstash provides an output which ships event data to Graphite for visualization.
heka::
An open-source event processing system developed by Mozilla and often compared to Logstash.
index::
An index can be seen as a named collection of documents in Elasticsearch which are available for searching and querying. It is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
indexer::
Refers to a Logstash instance which is tasked with interfacing with an Elasticsearch cluster in order to index event data.
input::
The means for ingesting data into Logstash. Inputs allow users to pull data from files, network sockets, other applications, etc. The initial phase of the typical Logstash pipeline (inputs->filters->outputs).
jar / jarfile::
A packaging method for Java libraries. Since Logstash runs on the JRuby runtime environment, it is possible to use these Java libraries to provide extra functionality to Logstash.
java::
An object-oriented programming language popular for its flexibility, extendability and portability.
jRuby:
JRuby is a 100% Java implementation of the Ruby programming language, which allows Ruby to run in the JVM. Logstash typically runs in JRuby, which provides it with a fast, extensible runtime environment.
kibana::
A visual tool for viewing time-based data which has been stored in Elasticsearch. Kibana features a powerful set of functionality based on panels which query Elasticsearch in different ways.
log::
A snippet of textual information emitted by a device, ostensibly with some pertinent information about the status of said device.
log4j::
A very common Java-based logging utility.
Logstash::
An application which offers a powerful data processing pipeline, allowing users to consume information from various sources, enrich the data, and output it to any number of other sources.
lumberjack::
A protocol for shipping logs from one location to another, in a secure and optimized manner. Also the (deprecated) name of a software application, now known as Logstash Forwarder (LSF).
output::
The means for passing event data out of Logstash into other applications, network endpoints, files, etc. The last phase of the typical Logstash pipeline (inputs->filters->outputs).
pipeline::
A term used to describe the flow of events through the Logstash workflow. The pipeline typically consists of a series of inputs, filters, and outputs.
plugin::
A generic term referring to an input, codec, filter, or output which extends basic Logstash functionality.
redis::
An open-source key-value store and cache which is often used in conjunction with Logstash as a message broker.
ruby::
A popular, open-source, object-oriented programming language in which Logstash is implemented.
shell::
A command-line interface to an operating system.
shipper::
An instance of Logstash which send events to another instance of Logstash, or some other application.
statsd::
A network daemon for aggregating statistics, such as counters and timers, and shipping over UDP to backend services, such as Graphite or Datadog. Logstash provides an output to statsd.
stdin::
An I/O stream providing input to a software application. In Logstash, an input which receives data from this stream.
stdout::
An I/O stream producing output from a software application. In Logstash, an output which produces data from this stream.
syslog::
A popular method for logging messages from a computer. The standard is somewhat loose, but Logstash has tools (input, grok patterns) to make this simpler.
standalone::
A configuration of Logstash in which the Logstash agent, input and output sources typically live on the same host machine.
thread::
Parallel sequences of execution within a process which allow a computer to perform several tasks simultaneously, in a multi-processor environment. Logstash takes advantage of this functionality, by specifying the "-w" flag
type::
In Elasticsearch type, a type can be compared to a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed. To index documents, it is required to specify both an index and a type.
worker::
The filter thread model used by Logstash, where each worker receives an event and applies all filters, in order, before emitting the event to the output queue. This allows scalability across CPUs because many filters are CPU intensive (permitting that we have thread safety).

View file

@ -0,0 +1,24 @@
= Plugin Milestones
== Why Milestones?
Plugins (inputs/outputs/filters/codecs) have a milestone label in logstash. This is to provide an indicator to the end-user as to the kinds of changes a given plugin could have between logstash releases.
The desire here is to allow plugin developers to quickly iterate on possible new plugins while conveying to the end-user a set of expectations about that plugin.
== Milestone 1
Plugins at this milestone need your feedback to improve! Plugins at this milestone may change between releases as the community figures out the best way for the plugin to behave and be configured.
== Milestone 2
Plugins at this milestone are more likely to have backwards-compatibility to previous releases than do Milestone 1 plugins. This milestone also indicates a greater level of in-the-wild usage by the community than the previous milestone.
== Milestone 3
Plugins at this milestone have strong promises towards backwards-compatibility. This is enforced with automated tests to ensure behavior and configuration are consistent across releases.
== Milestone 0
This milestone appears at the bottom of the page because it is very infrequently used.
This milestone marker is used to generally indicate that a plugin has no active code maintainer nor does it have support from the community in terms of getting help.