Cleanup docs directory

Remove old, unused markdown docs
Bring dir structure to mirror logstash-docs repo
This commit is contained in:
Suyog Rao 2015-12-16 14:54:34 -08:00
parent 0084c00b38
commit d2d0bd765c
61 changed files with 347 additions and 2578 deletions

3
.gitignore vendored
View file

@ -26,3 +26,6 @@ spec/reports
rspec.xml
.install-done
.vendor
integration_run
.mvn/

View file

@ -1,322 +0,0 @@
---
title: Configuration Language - Logstash
layout: content_right
---
# Logstash Config Language
The Logstash config language aims to be simple.
There are 3 main sections: inputs, filters, outputs. Each section has
configurations for each plugin available in that section.
Example:
# This is a comment. You should use comments to describe
# parts of your configuration.
input {
...
}
filter {
...
}
output {
...
}
## Filters and Ordering
For a given event, are applied in the order of appearance in the
configuration file.
## Comments
Comments are the same as in ruby, perl, and python. Starts with a '#' character.
Example:
# this is a comment
input { # comments can appear at the end of a line, too
# ...
}
## Plugins
The input, filter and output sections all let you configure plugins. Plugin
configuration consists of the plugin name followed by a block of settings for
that plugin. For example, how about two file inputs:
input {
file {
path => "/var/log/messages"
type => "syslog"
}
file {
path => "/var/log/apache/access.log"
type => "apache"
}
}
The above configures two file separate inputs. Both set two
configuration settings each: 'path' and 'type'. Each plugin has different
settings for configuring it; seek the documentation for your plugin to
learn what settings are available and what they mean. For example, the
[file input][fileinput] documentation will explain the meanings of the
path and type settings.
[fileinput]: inputs/file
## Value Types
The documentation for a plugin may enforce a configuration field having a
certain type. Examples include boolean, string, array, number, hash,
etc.
### <a name="boolean"></a>Boolean
A boolean must be either `true` or `false`. Note the lack of quotes around
`true` and `false`.
Examples:
debug => true
### <a name="string"></a>String
A string must be a single value.
Example:
name => "Hello world"
Single, unquoted words are valid as strings, too, but you should use quotes.
### <a name="number"></a>Number
Numbers must be valid numerics (floating point or integer are OK).
Example:
port => 33
### <a name="array"></a>Array
An array can be a single string value or multiple. If you specify the same
field multiple times, it appends to the array.
Examples:
path => [ "/var/log/messages", "/var/log/*.log" ]
path => "/data/mysql/mysql.log"
The above makes 'path' a 3-element array including all 3 strings.
### <a name="hash"></a>Hash
A hash is basically the same syntax as Ruby hashes.
The key and value are simply pairs, such as:
match => {
"field1" => "value1"
"field2" => "value2"
...
}
## <a name="eventdependent"></a>Event Dependent Configuration
The logstash agent is a processing pipeline with 3 stages: inputs -> filters ->
outputs. Inputs generate events, filters modify them, outputs ship them
elsewhere.
All events have properties. For example, an apache access log would have things
like status code (200, 404), request path ("/", "index.html"), HTTP verb
(GET, POST), client IP address, etc. Logstash calls these properties "fields."
Some of the configuration options in Logstash require the existence of fields in
order to function. Because inputs generate events, there are no fields to
evaluate within the input block--they do not exist yet!
Because of their dependency on events and fields, the following configuration
options will only work within filter and output blocks.
**IMPORTANT: Field references, sprintf format and conditionals, described below,
will not work in an input block.
### <a name="fieldreferences"></a>Field References
In many cases, it is useful to be able to refer to a field by name. To do this,
you can use the Logstash field reference syntax.
By way of example, let us suppose we have this event:
{
"agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
"ip": "192.168.24.44",
"request": "/index.html"
"response": {
"status": 200,
"bytes": 52353
},
"ua": {
"os": "Windows 7"
}
}
- the syntax to access fields is `[fieldname]`.
- if you are only referring to a **top-level field**, you can omit the `[]` and
simply say `fieldname`.
- in the case of **nested fields**, like the "os" field above, you need
the full path to that field: `[ua][os]`.
### <a name="sprintf"></a>sprintf format
This syntax is also used in what Logstash calls 'sprintf format'. This format
allows you to refer to field values from within other strings. For example, the
statsd output has an 'increment' setting, to allow you to keep a count of
apache logs by status code:
output {
statsd {
increment => "apache.%{[response][status]}"
}
}
You can also do time formatting in this sprintf format. Instead of specifying a
field name, use the `+FORMAT` syntax where `FORMAT` is a
[time format](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html).
For example, if you want to use the file output to write to logs based on the
hour and the 'type' field:
output {
file {
path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
}
}
### <a name="conditionals"></a>Conditionals
Sometimes you only want a filter or output to process an event under
certain conditions. For that, you'll want to use a conditional!
Conditionals in Logstash look and act the same way they do in programming
languages. You have `if`, `else if` and `else` statements. Conditionals may be
nested if you need that.
The syntax is follows:
if EXPRESSION {
...
} else if EXPRESSION {
...
} else {
...
}
What's an expression? Comparison tests, boolean logic, etc!
The following comparison operators are supported:
* equality, etc: ==, !=, <, >, <=, >=
* regexp: =~, !~
* inclusion: in, not in
The following boolean operators are supported:
* and, or, nand, xor
The following unary operators are supported:
* !
Expressions may contain expressions. Expressions may be negated with `!`.
Expressions may be grouped with parentheses `(...)`. Expressions can be long
and complex.
For example, if we want to remove the field `secret` if the field
`action` has a value of `login`:
filter {
if [action] == "login" {
mutate { remove => "secret" }
}
}
The above uses the field reference syntax to get the value of the
`action` field. It is compared against the text `login` and, if equal,
allows the mutate filter to delete the field named `secret`.
How about a more complex example?
* alert nagios of any apache events with status 5xx
* record any 4xx status to elasticsearch
* record all status code hits via statsd
How about telling nagios of any http event that has a status code of 5xx?
output {
if [type] == "apache" {
if [status] =~ /^5\d\d/ {
nagios { ... }
} else if [status] =~ /^4\d\d/ {
elasticsearch { ... }
}
statsd { increment => "apache.%{status}" }
}
}
You can also do multiple expressions in a single condition:
output {
# Send production errors to pagerduty
if [loglevel] == "ERROR" and [deployment] == "production" {
pagerduty {
...
}
}
}
You can test whether a field was present, regardless of its value:
if [exception_message] {
# If the event has an exception_message field, set the level
mutate { add_field => { "level" => "ERROR" } }
}
Here are some examples for testing with the in conditional:
filter {
if [foo] in [foobar] {
mutate { add_tag => "field in field" }
}
if [foo] in "foo" {
mutate { add_tag => "field in string" }
}
if "hello" in [greeting] {
mutate { add_tag => "string in field" }
}
if [foo] in ["hello", "world", "foo"] {
mutate { add_tag => "field in list" }
}
if [missing] in [alsomissing] {
mutate { add_tag => "shouldnotexist" }
}
if !("foo" in ["hello", "world"]) {
mutate { add_tag => "shouldexist" }
}
}
Or, to test if grok was successful:
output {
if "_grokparsefailure" not in [tags] {
elasticsearch { ... }
}
}
## Further Reading
For more information, see [the plugin docs index](index)

View file

@ -1,59 +0,0 @@
---
title: Logstash Contrib plugins
layout: content_right
---
# contrib plugins
As logstash has grown, we've accumulated a massive repository of plugins. Well
over 100 plugins, it became difficult for the project maintainers to adequately
support everything effectively.
In order to improve the quality of popular plugins, we've moved the
less-commonly-used plugins to a separate repository we're calling "contrib".
Concentrating common plugin usage into core solves a few problems, most notably
user complaints about the size of logstash releases, support/maintenance costs,
etc.
It is our intent that this separation will improve life for users. If it
doesn't, please file a bug so we can work to address it!
If a plugin is available in the 'contrib' package, the documentation for that
plugin will note this boldly at the top of that plugin's documentation.
Contrib plugins reside in a [separate github project](https://github.com/elasticsearch/logstash-contrib).
# Packaging
At present, the contrib modules are available as a tarball.
# Automated Installation
The `bin/plugin` script will handle the installation for you:
cd /path/to/logstash
bin/plugin install contrib
# Manual Installation
The contrib plugins can be extracted on top of an existing Logstash installation.
For example, if I've extracted `logstash-%VERSION%.tar.gz` into `/path`, e.g.
cd /path
tar zxf ~/logstash-%VERSION%.tar.gz
It will have a `/path/logstash-%VERSION%` directory, e.g.
$ ls
logstash-%VERSION%
The method to install the contrib tarball is identical.
cd /path
wget http://download.elasticsearch.org/logstash/logstash/logstash-contrib-%VERSION%.tar.gz
tar zxf ~/logstash-contrib-%VERSION%.tar.gz
This will install the contrib plugins in the same directory as the core
install. These plugins will be available to logstash the next time it starts.

View file

@ -1,250 +0,0 @@
require "rubygems"
require "erb"
require "optparse"
require "kramdown" # markdown parser
$: << Dir.pwd
$: << File.join(File.dirname(__FILE__), "..", "lib")
require "logstash/config/mixin"
require "logstash/inputs/base"
require "logstash/codecs/base"
require "logstash/filters/base"
require "logstash/outputs/base"
require "logstash/version"
class LogStashConfigDocGenerator
COMMENT_RE = /^ *#(?: (.*)| *$)/
def initialize
@rules = {
COMMENT_RE => lambda { |m| add_comment(m[1]) },
/^ *class.*< *LogStash::(Outputs|Filters|Inputs|Codecs)::(Base|Threadable)/ => \
lambda { |m| set_class_description },
/^ *config +[^=].*/ => lambda { |m| add_config(m[0]) },
/^ *milestone .*/ => lambda { |m| set_milestone(m[0]) },
/^ *config_name .*/ => lambda { |m| set_config_name(m[0]) },
/^ *flag[( ].*/ => lambda { |m| add_flag(m[0]) },
/^ *(class|def|module) / => lambda { |m| clear_comments },
}
if File.exists?("build/contrib_plugins")
@contrib_list = File.read("build/contrib_plugins").split("\n")
else
@contrib_list = []
end
end
def parse(string)
clear_comments
buffer = ""
string.split(/\r\n|\n/).each do |line|
# Join long lines
if line =~ COMMENT_RE
# nothing
else
# Join extended lines
if line =~ /(, *$)|(\\$)|(\[ *$)/
buffer += line.gsub(/\\$/, "")
next
end
end
line = buffer + line
buffer = ""
@rules.each do |re, action|
m = re.match(line)
if m
action.call(m)
end
end # RULES.each
end # string.split("\n").each
end # def parse
def set_class_description
@class_description = @comments.join("\n")
clear_comments
end # def set_class_description
def add_comment(comment)
return if comment == "encoding: utf-8"
@comments << comment
end # def add_comment
def add_config(code)
# I just care about the 'config :name' part
code = code.sub(/,.*/, "")
# call the code, which calls 'config' in this class.
# This will let us align comments with config options.
name, opts = eval(code)
# TODO(sissel): This hack is only required until regexp configs
# are gone from logstash.
name = name.to_s unless name.is_a?(Regexp)
description = Kramdown::Document.new(@comments.join("\n")).to_html
@attributes[name][:description] = description
clear_comments
end # def add_config
def add_flag(code)
# call the code, which calls 'config' in this class.
# This will let us align comments with config options.
#p :code => code
fixed_code = code.gsub(/ do .*/, "")
#p :fixedcode => fixed_code
name, description = eval(fixed_code)
@flags[name] = description
clear_comments
end # def add_flag
def set_config_name(code)
name = eval(code)
@name = name
end # def set_config_name
def set_milestone(code)
@milestone = eval(code)
end
# pretend to be the config DSL and just get the name
def config(name, opts={})
return name, opts
end # def config
# Pretend to support the flag DSL
def flag(*args, &block)
name = args.first
description = args.last
return name, description
end # def config
# pretend to be the config dsl's 'config_name' method
def config_name(name)
return name
end # def config_name
# pretend to be the config dsl's 'milestone' method
def milestone(m)
return m
end # def milestone
def clear_comments
@comments.clear
end # def clear_comments
def generate(file, settings)
@class_description = ""
@milestone = ""
@comments = []
@attributes = Hash.new { |h,k| h[k] = {} }
@flags = {}
# local scoping for the monkeypatch belowg
attributes = @attributes
# Monkeypatch the 'config' method to capture
# Note, this monkeypatch requires us do the config processing
# one at a time.
#LogStash::Config::Mixin::DSL.instance_eval do
#define_method(:config) do |name, opts={}|
#p name => opts
#attributes[name].merge!(opts)
#end
#end
# Loading the file will trigger the config dsl which should
# collect all the config settings.
load file
# parse base first
parse(File.new(File.join(File.dirname(file), "base.rb"), "r").read)
# Now parse the real library
code = File.new(file).read
# inputs either inherit from Base or Threadable.
if code =~ /\< LogStash::Inputs::Threadable/
parse(File.new(File.join(File.dirname(file), "threadable.rb"), "r").read)
end
if code =~ /include LogStash::PluginMixins/
mixin = code.gsub(/.*include LogStash::PluginMixins::(\w+)\s.*/m, '\1')
mixin.gsub!(/(.)([A-Z])/, '\1_\2')
mixin.downcase!
parse(File.new(File.join(File.dirname(file), "..", "plugin_mixins", "#{mixin}.rb")).read)
end
parse(code)
puts "Generating docs for #{file}"
if @name.nil?
$stderr.puts "Missing 'config_name' setting in #{file}?"
return nil
end
klass = LogStash::Config::Registry.registry[@name]
if klass.ancestors.include?(LogStash::Inputs::Base)
section = "input"
elsif klass.ancestors.include?(LogStash::Filters::Base)
section = "filter"
elsif klass.ancestors.include?(LogStash::Outputs::Base)
section = "output"
elsif klass.ancestors.include?(LogStash::Codecs::Base)
section = "codec"
end
template_file = File.join(File.dirname(__FILE__), "plugin-doc.html.erb")
template = ERB.new(File.new(template_file).read, nil, "-")
is_contrib_plugin = @contrib_list.include?(file)
# descriptions are assumed to be markdown
description = Kramdown::Document.new(@class_description).to_html
klass.get_config.each do |name, settings|
@attributes[name].merge!(settings)
end
sorted_attributes = @attributes.sort { |a,b| a.first.to_s <=> b.first.to_s }
klassname = LogStash::Config::Registry.registry[@name].to_s
name = @name
synopsis_file = File.join(File.dirname(__FILE__), "plugin-synopsis.html.erb")
synopsis = ERB.new(File.new(synopsis_file).read, nil, "-").result(binding)
if settings[:output]
dir = File.join(settings[:output], section + "s")
path = File.join(dir, "#{name}.html")
Dir.mkdir(settings[:output]) if !File.directory?(settings[:output])
Dir.mkdir(dir) if !File.directory?(dir)
File.open(path, "w") do |out|
html = template.result(binding)
html.gsub!("%VERSION%", LOGSTASH_VERSION)
html.gsub!("%PLUGIN%", @name)
out.puts(html)
end
else
puts template.result(binding)
end
end # def generate
end # class LogStashConfigDocGenerator
if __FILE__ == $0
opts = OptionParser.new
settings = {}
opts.on("-o DIR", "--output DIR",
"Directory to output to; optional. If not specified,"\
"we write to stdout.") do |val|
settings[:output] = val
end
args = opts.parse(ARGV)
args.each do |arg|
gen = LogStashConfigDocGenerator.new
gen.generate(arg, settings)
end
end

View file

@ -1,108 +0,0 @@
---
title: How to extend - logstash
layout: content_right
---
# Add a new filter
This document shows you how to add a new filter to logstash.
For a general overview of how to add a new plugin, see [the extending
logstash](.) overview.
## Write code.
Let's write a 'hello world' filter. This filter will replace the 'message' in
the event with "Hello world!"
First, logstash expects plugins in a certain directory structure: `logstash/TYPE/PLUGIN_NAME.rb`
Since we're creating a filter, let's mkdir this:
mkdir -p logstash/filters/
cd logstash/filters
Now add the code:
# Call this file 'foo.rb' (in logstash/filters, as above)
require "logstash/filters/base"
require "logstash/namespace"
class LogStash::Filters::Foo < LogStash::Filters::Base
# Setting the config_name here is required. This is how you
# configure this filter from your logstash config.
#
# filter {
# foo { ... }
# }
config_name "foo"
# New plugins should start life at milestone 1.
milestone 1
# Replace the message with this value.
config :message, :validate => :string
public
def register
# nothing to do
end # def register
public
def filter(event)
# return nothing unless there's an actual filter event
return unless filter?(event)
if @message
# Replace the event message with our message as configured in the
# config file.
event["message"] = @message
end
# filter_matched should go in the last line of our successful code
filter_matched(event)
end # def filter
end # class LogStash::Filters::Foo
## Add it to your configuration
For this simple example, let's just use stdin input and stdout output.
The config file looks like this:
input {
stdin { type => "foo" }
}
filter {
if [type] == "foo" {
foo {
message => "Hello world!"
}
}
}
output {
stdout { }
}
Call this file 'example.conf'
## Tell logstash about it.
Depending on how you installed logstash, you have a few ways of including this
plugin.
You can use the agent flag --pluginpath flag to specify where the root of your
plugin tree is. In our case, it's the current directory.
% bin/logstash --pluginpath your/plugin/root -f example.conf
## Example running
In the example below, I typed in "the quick brown fox" after running the java
command.
% bin/logstash -f example.conf
the quick brown fox
2011-05-12T01:05:09.495000Z stdin://snack.home/: Hello world!
The output is the standard logstash stdout output, but in this case our "the
quick brown fox" message was replaced with "Hello world!"
All done! :)

View file

@ -1,91 +0,0 @@
---
title: How to extend - logstash
layout: content_right
---
# Extending logstash
You can add your own input, output, or filter plugins to logstash.
If you're looking to extend logstash today, please look at the existing plugins.
## Good examples of plugins
* [inputs/tcp](https://github.com/logstash/logstash/blob/master/lib/logstash/inputs/tcp.rb)
* [filters/multiline](https://github.com/logstash/logstash/blob/master/lib/logstash/filters/multiline.rb)
* [outputs/mongodb](https://github.com/logstash/logstash/blob/master/lib/logstash/outputs/mongodb.rb)
## Common concepts
* The `config_name` sets the name used in the config file.
* The `milestone` sets the milestone number of the plugin. See <../plugin-milestones> for more info.
* The `config` lines define config options.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
### Required modules
All plugins should require the Logstash module.
require 'logstash/namespace'
### Plugin name
Every plugin must have a name set with the `config_name` method. If this
is not specified plugins will fail to load with an error.
### Milestones
Every plugin needs a milestone set using `milestone`. See
<../plugin-milestones> for more info.
### Config lines
The `config` lines define configuration options and are constructed like
so:
config :host, :validate => :string, :default => "0.0.0.0"
The name of the option is specified, here `:host` and then the
attributes of the option. They can include `:validate`, `:default`,
`:required` (a Boolean `true` or `false`), `:deprecated` (also a
Boolean), and `:obsolete` (a String value).
## Inputs
All inputs require the LogStash::Inputs::Base class:
require 'logstash/inputs/base'
Inputs have two methods: `register` and `run`.
* Each input runs as its own thread.
* The `run` method is expected to run-forever.
## Filters
All filters require the LogStash::Filters::Base class:
require 'logstash/filters/base'
Filters have two methods: `register` and `filter`.
* The `filter` method gets an event.
* Call `event.cancel` to drop the event.
* To modify an event, simply make changes to the event you are given.
* The return value is ignored.
## Outputs
All outputs require the LogStash::Outputs::Base class:
require 'logstash/outputs/base'
Outputs have two methods: `register` and `receive`.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
* The `receive` method is called when an event gets pushed to your output
## Example: a new filter
Learn by example how to [add a new filter to logstash](example-add-a-new-filter)

View file

@ -1,45 +0,0 @@
---
title: Command-line flags - logstash
layout: content_right
---
# Command-line flags
## Agent
The logstash agent has the following flags (also try using the '--help' flag)
<dl>
<dt> -f, --config CONFIGFILE </dt>
<dd> Load the logstash config from a specific file, directory, or a
wildcard. If given a directory or wildcard, config files will be read
from the directory in alphabetical order. </dd>
<dt> -e CONFIGSTRING </dt>
<dd> Use the given string as the configuration data. Same syntax as the
config file. If not input is specified, 'stdin { type => stdin }' is
default. If no output is specified, 'stdout { debug => true }}' is
default. </dd>
<dt> -w, --filterworkers COUNT </dt>
<dd> Run COUNT filter workers (default: 1) </dd>
<dt> -l, --log FILE </dt>
<dd> Log to a given path. Default is to log to stdout </dd>
<dt> --verbose </dt>
<dd> Increase verbosity to the first level, less verbose.</dd>
<dt> --debug </dt>
<dd> Increase verbosity to the last level, more verbose.</dd>
<dt> -v </dt>
<dd> *DEPRECATED: see --verbose/debug* Increase verbosity. There are multiple levels of verbosity available with
'-vv' currently being the highest </dd>
<dt> --pluginpath PLUGIN_PATH </dt>
<dd> A colon-delimted path to find other logstash plugins in </dd>
</dl>
## Web
<dl>
<dt> -a, --address ADDRESS </dt>
<dd>Address on which to start webserver. Default is 0.0.0.0.</dd>
<dt> -p, --port PORT</dt>
<dd>Port on which to start webserver. Default is 9292.</dd>
</dl>

View file

@ -1,28 +0,0 @@
#!/usr/bin/env ruby
require "erb"
if ARGV.size != 1
$stderr.puts "No path given to search for plugin docs"
$stderr.puts "Usage: #{$0} plugin_doc_dir"
exit 1
end
def plugins(glob)
files = Dir.glob(glob)
names = files.collect { |f| File.basename(f).gsub(".html", "") }
return names.sort
end # def plugins
basedir = ARGV[0]
docs = {
"inputs" => plugins(File.join(basedir, "inputs/*.html")),
"codecs" => plugins(File.join(basedir, "codecs/*.html")),
"filters" => plugins(File.join(basedir, "filters/*.html")),
"outputs" => plugins(File.join(basedir, "outputs/*.html")),
}
template_path = File.join(File.dirname(__FILE__), "index.html.erb")
template = File.new(template_path).read
erb = ERB.new(template, nil, "-")
puts erb.result(binding)

View file

@ -1,46 +0,0 @@
---
title: Learn - logstash
layout: content_right
---
# What is Logstash?
Logstash is a tool for managing your logs.
It helps you take logs and other event data from your systems and move it into
a central place. Logstash is open source and completely free. You can find
support on the discussion forum and on IRC.
For an overview of Logstash and why you would use it, you should watch the
presentation I gave at CarolinaCon 2011:
[video here](http://carolinacon.blip.tv/file/5105901/). This presentation covers
Logstash, how you can use it, some alternatives, logging best practices,
parsing tools, etc. Video also below:
<!--
<embed src="http://blip.tv/play/gvE9grjcdQI" type="application/x-shockwave-flash" width="480" height="296" allowscriptaccess="always" allowfullscreen="true"></embed>
The slides are available online here: [slides](http://goo.gl/68c62). The slides
include speaker notes (click 'actions' then 'speaker notes').
-->
<iframe width="480" height="296" src="http://www.youtube.com/embed/RuUFnog29M4" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
The slides are available online here: [slides](http://semicomplete.com/presentations/logstash-puppetconf-2012/).
## Getting Help
There's [documentation](.) here on this site. If that isn't sufficient, you can
use the discussion [forum](https://discuss.elastic.co/c/logstash). Further, there is also
an IRC channel - #logstash on irc.freenode.org.
If you find a bug or have a feature request, file them
on [github](https://github.com/elasticsearch/logstas/issues). (Honestly though, if you prefer email or irc
for such things, that works for me, too.)
## Download It
[Download logstash-%VERSION%](https://download.elastic.co/logstash/logstash/logstash-%VERSION%.tar.gz)
## What's next?
Try this [guide](tutorials/getting-started-with-logstash) for a simple
real-world example getting started using Logstash.

View file

@ -1,109 +0,0 @@
---
title: the life of an event - logstash
layout: content_right
---
# the life of an event
The logstash agent is an event pipeline.
## The Pipeline
The logstash agent is a processing pipeline with 3 stages: inputs -> filters ->
outputs. Inputs generate events, filters modify them, outputs ship them
elsewhere.
Internal to logstash, events are passed from each phase using internal queues.
It is implemented with a 'SizedQueue' in Ruby. SizedQueue allows a bounded
maximum of items in the queue such that any writes to the queue will block if
the queue is full at maximum capacity.
Logstash sets each queue size to 20. This means only 20 events can be pending
into the next phase - this helps reduce any data loss and in general avoids
logstash trying to act as a data storage system. These internal queues are not
for storing messages long-term.
## Fault Tolerance
Starting at outputs, here's what happens when things break.
An output can fail or have problems because of some downstream cause, such as
full disk, permissions problems, temporary network failures, or service
outages. Most outputs should keep retrying to ship any events that were
involved in the failure.
If an output is failing, the output thread will wait until this output is
healthy again and able to successfully send the message. Therefore, the output
queue will stop being read from by this output and will eventually fill up with
events and block new events from being written to this queue.
A full output queue means filters will block trying to write to the output
queue. Because filters will be stuck, blocked writing to the output queue, they
will stop reading from the filter queue which will eventually cause the filter
queue (input -> filter) to fill up.
A full filter queue will cause inputs to block when writing to the filters.
This will cause each input to block, causing each input to stop processing new
data from wherever that input is getting new events.
In ideal circumstances, this will behave similarly to when the tcp window
closes to 0, no new data is sent because the receiver hasn't finished
processing the current queue of data, but as soon as the downstream (output)
problem is resolved, messages will begin flowing again..
## Thread Model
The thread model in logstash is currently:
input threads | filter worker threads | output worker
Filters are optional, so you will have this model if you have no filters
defined:
input threads | output worker
Each input runs in a thread by itself. This allows busier inputs to not be
blocked by slower ones, etc. It also allows for easier containment of scope
because each input has a thread.
The filter thread model is a 'worker' model where each worker receives an event
and applies all filters, in order, before emitting that to the output queue.
This allows scalability across CPUs because many filters are CPU intensive
(permitting that we have thread safety).
The default number of filter workers is 1, but you can increase this number
with the '-w' flag on the agent.
The output worker model is currently a single thread. Outputs will receive
events in the order they are defined in the config file.
Outputs may decide to buffer events temporarily before publishing them,
possibly in a separate thread. One example of this is the elasticsearch output
which will buffer events and flush them all at once, in a separate thread. This
mechanism (buffering many events + writing in a separate thread) can improve
performance so the logstash pipeline isn't stalled waiting for a response from
elasticsearch.
## Consequences and Expectations
Small queue sizes mean that logstash simply blocks and stalls safely during
times of load or other temporary pipeline problems. There are two alternatives
to this - unlimited queue length and dropping messages. Unlimited queues grow
grow unbounded and eventually exceed memory causing a crash which loses all of
those messages. Dropping messages is also an undesirable behavior in most cases.
At a minimum, logstash will have probably 3 threads (2 if you have no filters).
One input, one filter worker, and one output thread each.
If you see logstash using multiple CPUs, this is likely why. If you want to
know more about what each thread is doing, you should read this:
<http://www.semicomplete.com/blog/geekery/debugging-java-performance.html>.
Threads in java have names, and you can use jstack and top to figure out who is
using what resources. The URL above will help you learn how to do this.
On Linux platforms, logstash will label all the threads it can with something
descriptive. Inputs will show up as "<inputname" and filter workers as
"|worker" and outputs as ">outputworker" (or something similar). Other threads
may be labeled as well, and are intended to help you identify their purpose
should you wonder why they are consuming resources!

View file

@ -1,60 +0,0 @@
---
title: Logging tools comparisons - logstash
layout: content_right
---
# Logging tools comparison
The information below is provided as "best effort" and is not strictly intended
as a complete source of truth. If the information below is unclear or incorrect, please
email the logstash-users list (or send a pull request with the fix) :)
Where feasible, this document will also provide information on how you can use
logstash with these other projects.
# logstash
Primary goal: Make log/event data and analytics accessible.
Overview: Where your logs come from, how you store them, or what you do with
them is up to you. Logstash exists to help make such actions easier and faster.
It provides you a simple event pipeline for taking events and logs from any
input, manipulating them with filters, and sending them to any output. Inputs
can be files, network, message brokers, etc. Filters are date and string
parsers, grep-like, etc. Outputs are data stores (elasticsearch, mongodb, etc),
message systems (rabbitmq, stomp, etc), network (tcp, syslog), etc.
It also provides a web interface for doing search and analytics on your
logs.
# graylog2
[http://graylog2.org/](http://graylog2.org)
_Overview to be written_
You can use graylog2 with logstash by using the 'gelf' output to send logstash
events to a graylog2 server. This gives you logstash's excellent input and
filter features while still being able to use the graylog2 web interface.
# whoops
[whoops site](http://www.whoopsapp.com/)
_Overview to be written_
A logstash output to whoops is coming soon - <https://logstash.jira.com/browse/LOGSTASH-133>
# flume
[flume site](https://github.com/cloudera/flume/wiki)
Flume is primarily a transport system aimed at reliably copying logs from
application servers to HDFS.
You can use it with logstash by having a syslog sink configured to shoot logs
at a logstash syslog input.
# scribe
_Overview to be written_

View file

@ -1,41 +0,0 @@
---
title: Plugin Milestones - logstash
layout: content_right
---
# Plugin Milestones
Plugins (inputs/outputs/filters/codecs) have a milestone label in logstash.
This is to provide an indicator to the end-user as to the kinds of changes
a given plugin could have between logstash releases.
The desire here is to allow plugin developers to quickly iterate on possible
new plugins while conveying to the end-user a set of expectations about that
plugin.
## Milestone 1
Plugins at this milestone need your feedback to improve! Plugins at this
milestone may change between releases as the community figures out the best way
for the plugin to behave and be configured.
## Milestone 2
Plugins at this milestone are more likely to have backwards-compatibility to
previous releases than do Milestone 1 plugins. This milestone also indicates
a greater level of in-the-wild usage by the community than the previous
milestone.
## Milestone 3
Plugins at this milestone have strong promises towards backwards-compatibility.
This is enforced with automated tests to ensure behavior and configuration are
consistent across releases.
## Milestone 0
This milestone appears at the bottom of the page because it is very
infrequently used.
This milestone marker is used to generally indicate that a plugin has no
active code maintainer nor does it have support from the community in terms
of getting help.

View file

@ -1,64 +0,0 @@
---
title: release notes for %VERSION%
layout: content_right
---
# %VERSION% - Release Notes
This document is targeted at existing users of Logstash who are upgrading from
an older version to version %VERSION%. This document is intended to supplement
a the [changelog
file](https://github.com/elasticsearch/logstash/blob/v%VERSION%/CHANGELOG) by
providing more details on certain changes.
### tarball
With Logstash 1.4.0, we stopped shipping the jar file and started shipping a
tarball instead.
Past releases have been a single jar file which included all Ruby and Java
library dependencies to eliminate deployment pains. We still ship all
the dependencies for you! The jar file served us well, but over time we found
Javas default heap size, garbage collector, and other settings werent well
suited to Logstash.
In order to provide better Java defaults, weve changed to releasing a tarball
(.tar.gz) that includes all the same dependencies. What does this mean to you?
Instead of running `java -jar logstash.jar ...` you run `bin/logstash ...` (for
Windows users, `bin/logstash.bat`)
One pleasant side effect of using a tarball is that the Logstash code itself is
much more accessible and able to satisfy any curiosity you may have.
The new way to do things is:
* Download logstash tarball
* Unpack it (`tar -zxf logstash-%VERSION%.tar.gz`)
* `cd logstash-%VERSION%`
% Run it: `bin/logstash ...`
The old way to run logstash of `java -jar logstash.jar` is now replaced with
`bin/logstash`. The command line arguments are exactly the same after that.
For example:
# Old way:
`% java -jar logstash-1.3.3-flatjar.jar agent -f logstash.conf`
# New way:
`% bin/logstash agent -f logstash.conf`
### plugins
Logstash has grown brilliantly over the past few years with great contributions
from the community. Now having 165 plugins, it became hard for us (the Logstash
engineering team) to reliably support all the wonderful technologies in each
contributed plugin. We combed through all the plugins and picked the ones we
felt strongly we could support, and those now ship by default with Logstash.
All the other plugins are now available in a contrib package. All plugins
continue to be open source and free, of course! Installing plugins is very easy:
....
% cd /path/to/logstash-%VERSION%/
% bin/plugin install [PLUGIN_NAME]
....

View file

@ -1,35 +0,0 @@
---
title: repositories - logstash
layout: content_right
---
# Logstash repositories
We also have Logstash available as APT and YUM repositories.
Our public signing key can be found on the [Elasticsearch packages apt GPG signing key page](https://packages.elasticsearch.org/GPG-KEY-elasticsearch)
## Apt based distributions
Add the key:
wget -O - https://packages.elasticsearch.org/GPG-KEY-elasticsearch | apt-key add -
Add the repo to /etc/apt/sources.list
deb http://packages.elasticsearch.org/logstash/1.4/debian stable main
## YUM based distributions
Add the key:
rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch
Add the repo to /etc/yum.repos.d/ directory
[logstash-1.4]
name=logstash repository for 1.4.x packages
baseurl=https://packages.elasticsearch.org/logstash/1.4/centos
gpgcheck=1
gpgkey=https://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

View file

@ -1,14 +1,14 @@
[[advanced-pipeline]]
=== Setting Up an Advanced Logstash Pipeline
A Logstash pipeline in most use cases has one or more input, filter, and output plugins. The scenarios in this section
A Logstash pipeline in most use cases has one or more input, filter, and output plugins. The scenarios in this section
build Logstash configuration files to specify these plugins and discuss what each plugin is doing.
The Logstash configuration file defines your _Logstash pipeline_. When you start a Logstash instance, use the
The Logstash configuration file defines your _Logstash pipeline_. When you start a Logstash instance, use the
`-f <path/to/file>` option to specify the configuration file that defines that instances pipeline.
A Logstash pipeline has two required elements, `input` and `output`, and one optional element, `filter`. The input
plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write
A Logstash pipeline has two required elements, `input` and `output`, and one optional element, `filter`. The input
plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write
the data to a destination.
image::static/images/basic_logstash_pipeline.png[]
@ -24,13 +24,13 @@ input {
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
#
# }
output {
}
--------------------------------------------------------------------------------
This skeleton is non-functional, because the input and output sections dont have any valid options defined. The
This skeleton is non-functional, because the input and output sections dont have any valid options defined. The
examples in this tutorial build configuration files to address specific use cases.
Paste the skeleton into a file named `first-pipeline.conf` in your home Logstash directory.
@ -38,17 +38,17 @@ Paste the skeleton into a file named `first-pipeline.conf` in your home Logstash
[[parsing-into-es]]
==== Parsing Apache Logs into Elasticsearch
This example creates a Logstash pipeline that takes Apache web logs as input, parses those logs to create specific,
This example creates a Logstash pipeline that takes Apache web logs as input, parses those logs to create specific,
named fields from the logs, and writes the parsed data to an Elasticsearch cluster.
You can download the sample data set used in this example
You can download the sample data set used in this example
https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz[here]. Unpack this file.
[float]
[[configuring-file-input]]
==== Configuring Logstash for File Input
To start your Logstash pipeline, configure the Logstash instance to read from a file using the
To start your Logstash pipeline, configure the Logstash instance to read from a file using the
{logstash}plugins-inputs-file.html[file] input plugin.
Edit the `first-pipeline.conf` file to add the following text:
@ -63,8 +63,8 @@ input {
}
--------------------------------------------------------------------------------
<1> The default behavior of the file input plugin is to monitor a file for new information, in a manner similar to the
UNIX `tail -f` command. To change this default behavior and process the entire file, we need to specify the position
<1> The default behavior of the file input plugin is to monitor a file for new information, in a manner similar to the
UNIX `tail -f` command. To change this default behavior and process the entire file, we need to specify the position
where Logstash starts processing the file.
Replace `/path/to/` with the actual path to the location of `logstash-tutorial.log` in your file system.
@ -73,22 +73,22 @@ Replace `/path/to/` with the actual path to the location of `logstash-tutorial.l
[[configuring-grok-filter]]
===== Parsing Web Logs with the Grok Filter Plugin
The {logstash}plugins-filters-grok.html[`grok`] filter plugin is one of several plugins that are available by default in
Logstash. For details on how to manage Logstash plugins, see the <<working-with-plugins,reference documentation>> for
The {logstash}plugins-filters-grok.html[`grok`] filter plugin is one of several plugins that are available by default in
Logstash. For details on how to manage Logstash plugins, see the <<working-with-plugins,reference documentation>> for
the plugin manager.
Because the `grok` filter plugin looks for patterns in the incoming log data, configuration requires you to make
decisions about how to identify the patterns that are of interest to your use case. A representative line from the web
Because the `grok` filter plugin looks for patterns in the incoming log data, configuration requires you to make
decisions about how to identify the patterns that are of interest to your use case. A representative line from the web
server log sample looks like this:
[source,shell]
--------------------------------------------------------------------------------
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
--------------------------------------------------------------------------------
The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. In this tutorial, use
The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. In this tutorial, use
the `%{COMBINEDAPACHELOG}` grok pattern, which structures lines from the Apache log using the following schema:
[horizontal]
@ -123,7 +123,7 @@ After processing, the sample line has the following JSON representation:
{
"clientip" : "83.149.9.216",
"ident" : ,
"auth" : ,
"auth" : ,
"timestamp" : "04/Jan/2015:05:13:42 +0000",
"verb" : "GET",
"request" : "/presentations/logstash-monitorama-2013/images/kibana-search.png",
@ -139,7 +139,7 @@ After processing, the sample line has the following JSON representation:
[[indexing-parsed-data-into-elasticsearch]]
===== Indexing Parsed Data into Elasticsearch
Now that the web logs are broken down into specific fields, the Logstash pipeline can index the data into an
Now that the web logs are broken down into specific fields, the Logstash pipeline can index the data into an
Elasticsearch cluster. Edit the `first-pipeline.conf` file to add the following text after the `input` section:
[source,json]
@ -152,17 +152,17 @@ output {
With this configuration, Logstash uses http protocol to connect to Elasticsearch. The above example assumes Logstash
and Elasticsearch to be running on the same instance. You can specify a remote Elasticsearch instance using `hosts`
configuration like `hosts => "es-machine:9092"`.
configuration like `hosts => "es-machine:9092"`.
[float]
[[configuring-geoip-plugin]]
===== Enhancing Your Data with the Geoip Filter Plugin
In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing
data. As an example, the {logstash}plugins-filters-geoip.html[`geoip`] plugin looks up IP addresses, derives geographic
In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing
data. As an example, the {logstash}plugins-filters-geoip.html[`geoip`] plugin looks up IP addresses, derives geographic
location information from the addresses, and adds that location information to the logs.
Configure your Logstash instance to use the `geoip` filter plugin by adding the following lines to the `filter` section
Configure your Logstash instance to use the `geoip` filter plugin by adding the following lines to the `filter` section
of the `first-pipeline.conf` file:
[source,json]
@ -172,7 +172,7 @@ geoip {
}
--------------------------------------------------------------------------------
The `geoip` plugin configuration requires data that is already defined as separate fields. Make sure that the `geoip`
The `geoip` plugin configuration requires data that is already defined as separate fields. Make sure that the `geoip`
section is after the `grok` section of the configuration file.
Specify the name of the field that contains the IP address to look up. In this tutorial, the field name is `clientip`.
@ -335,11 +335,11 @@ Only one of the log entries comes from Buffalo, so the query produces a single r
[[multiple-input-output-plugins]]
==== Multiple Input and Output Plugins
The information you need to manage often comes from several disparate sources, and use cases can require multiple
destinations for your data. Your Logstash pipeline can use multiple input and output plugins to handle these
The information you need to manage often comes from several disparate sources, and use cases can require multiple
destinations for your data. Your Logstash pipeline can use multiple input and output plugins to handle these
requirements.
This example creates a Logstash pipeline that takes input from a Twitter feed and the Filebeat client, then
This example creates a Logstash pipeline that takes input from a Twitter feed and the Filebeat client, then
sends the information to an Elasticsearch cluster as well as writing the information directly to a file.
[float]
@ -354,7 +354,7 @@ To add a Twitter feed, you need several pieces of information:
* An _oauth token_, which identifies the Twitter account using this app.
* An _oauth token secret_, which serves as the password of the Twitter account.
Visit https://dev.twitter.com/apps to set up a Twitter account and generate your consumer key and secret, as well as
Visit https://dev.twitter.com/apps to set up a Twitter account and generate your consumer key and secret, as well as
your OAuth token and secret.
Use this information to add the following lines to the `input` section of the `first-pipeline.conf` file:
@ -366,7 +366,7 @@ twitter {
consumer_secret =>
keywords =>
oauth_token =>
oauth_token_secret =>
oauth_token_secret =>
}
--------------------------------------------------------------------------------
@ -374,19 +374,19 @@ twitter {
[[configuring-lsf]]
==== The Filebeat Client
The https://github.com/elastic/beats/tree/master/filebeat[filebeat] client is a lightweight, resource-friendly tool that
collects logs from files on the server and forwards these logs to your Logstash instance for processing. The
Filebeat client uses the secure Beats protocol to communicate with your Logstash instance. The
lumberjack protocol is designed for reliability and low latency. Filebeat uses the computing resources of
the machine hosting the source data, and the {logstash}plugins-inputs-beats.html[Beats input] plugin minimizes the
The https://github.com/elastic/beats/tree/master/filebeat[filebeat] client is a lightweight, resource-friendly tool that
collects logs from files on the server and forwards these logs to your Logstash instance for processing. The
Filebeat client uses the secure Beats protocol to communicate with your Logstash instance. The
lumberjack protocol is designed for reliability and low latency. Filebeat uses the computing resources of
the machine hosting the source data, and the {logstash}plugins-inputs-beats.html[Beats input] plugin minimizes the
resource demands on the Logstash instance.
NOTE: In a typical use case, Filebeat runs on a separate machine from the machine running your
NOTE: In a typical use case, Filebeat runs on a separate machine from the machine running your
Logstash instance. For the purposes of this tutorial, Logstash and Filebeat are running on the
same machine.
Default Logstash configuration includes the {logstash}plugins-inputs-beats.html[Beats input plugin], which is
designed to be resource-friendly. To install Filebeat on your data source machine, download the
Default Logstash configuration includes the {logstash}plugins-inputs-beats.html[Beats input plugin], which is
designed to be resource-friendly. To install Filebeat on your data source machine, download the
appropriate package from the Filebeat https://www.elastic.co/downloads/beats/filebeat[product page].
Create a configuration file for Filebeat similar to the following example:
@ -414,9 +414,9 @@ output:
<2> Path to the SSL certificate for the Logstash instance.
--------------------------------------------------------------------------------
Save this configuration file as `filebeat.yml`.
Save this configuration file as `filebeat.yml`.
Configure your Logstash instance to use the Filebeat input plugin by adding the following lines to the `input` section
Configure your Logstash instance to use the Filebeat input plugin by adding the following lines to the `input` section
of the `first-pipeline.conf` file:
[source,json]
@ -436,10 +436,10 @@ beats {
[[logstash-file-output]]
==== Writing Logstash Data to a File
You can configure your Logstash pipeline to write data directly to a file with the
You can configure your Logstash pipeline to write data directly to a file with the
{logstash}plugins-outputs-file.html[`file`] output plugin.
Configure your Logstash instance to use the `file` output plugin by adding the following lines to the `output` section
Configure your Logstash instance to use the `file` output plugin by adding the following lines to the `output` section
of the `first-pipeline.conf` file:
[source,json]
@ -453,7 +453,7 @@ file {
[[multiple-es-nodes]]
==== Writing to multiple Elasticsearch nodes
Writing to multiple Elasticsearch nodes lightens the resource demands on a given Elasticsearch node, as well as
Writing to multiple Elasticsearch nodes lightens the resource demands on a given Elasticsearch node, as well as
providing redundant points of entry into the cluster when a particular node is unavailable.
To configure your Logstash instance to write to multiple Elasticsearch nodes, edit the output section of the `first-pipeline.conf` file to read:
@ -467,7 +467,7 @@ output {
}
--------------------------------------------------------------------------------
Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the `hosts`
Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the `hosts`
parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses. Also note that
default port for Elasticsearch is `9200` and can be omitted in the configuration above.
@ -504,7 +504,7 @@ output {
}
--------------------------------------------------------------------------------
Logstash is consuming data from the Twitter feed you configured, receiving data from Filebeat, and
Logstash is consuming data from the Twitter feed you configured, receiving data from Filebeat, and
indexing this information to three nodes in an Elasticsearch cluster as well as writing to a file.
At the data source machine, run Filebeat with the following command:
@ -514,7 +514,7 @@ At the data source machine, run Filebeat with the following command:
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
Filebeat will attempt to connect on port 5403. Until Logstash starts with an active Beats plugin, there
Filebeat will attempt to connect on port 5403. Until Logstash starts with an active Beats plugin, there
wont be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.
To verify your configuration, run the following command:
@ -558,17 +558,17 @@ Shutting down a running Logstash instance involves the following steps:
The following conditions affect the shutdown process:
* An input plugin receiving data at a slow pace.
* A slow filter, like a Ruby filter executing `sleep(10000)` or an Elasticsearch filter that is executing a very heavy
* A slow filter, like a Ruby filter executing `sleep(10000)` or an Elasticsearch filter that is executing a very heavy
query.
* A disconnected output plugin that is waiting to reconnect to flush in-flight events.
These situations make the duration and success of the shutdown process unpredictable.
Logstash has a stall detection mechanism that analyzes the behavior of the pipeline and plugins during shutdown.
This mechanism produces periodic information about the count of inflight events in internal queues and a list of busy
This mechanism produces periodic information about the count of inflight events in internal queues and a list of busy
worker threads.
To enable Logstash to forcibly terminate in the case of a stalled shutdown, use the `--allow-unsafe-shutdown` flag when
To enable Logstash to forcibly terminate in the case of a stalled shutdown, use the `--allow-unsafe-shutdown` flag when
you start Logstash.
[[shutdown-stall-example]]
@ -587,22 +587,22 @@ Logstash startup completed
Received shutdown signal, but pipeline is still waiting for in-flight events
to be processed. Sending another ^C will force quit Logstash, but this may cause
data loss. {:level=>:warn}
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
"STALLING_THREADS"=>
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
"name"=>"|filterworker.0", "current_call"=>"
(ruby filter code):1:in `sleep'"}]}}
The shutdown process appears to be stalled due to busy or blocked plugins. Check
the logs for more information.
The shutdown process appears to be stalled due to busy or blocked plugins. Check
the logs for more information.
{:level=>:error}
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
"STALLING_THREADS"=>
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
"name"=>"|filterworker.0", "current_call"=>"
(ruby filter code):1:in `sleep'"}]}}
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
{:level=>:warn, "INFLIGHT_EVENT_COUNT"=>{"input_to_filter"=>20, "total"=>20},
"STALLING_THREADS"=>
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
{["LogStash::Filters::Ruby", {"code"=>"sleep 10000"}]=>[{"thread_id"=>15,
"name"=>"|filterworker.0", "current_call"=>"
(ruby filter code):1:in `sleep'"}]}}
Forcefully quitting logstash.. {:level=>:fatal}

View file

@ -1,21 +1,21 @@
[[breaking-changes]]
== Breaking changes
Version 2.0 of Logstash has some changes that are incompatible with previous versions of Logstash. This section discusses
Version 2.0 of Logstash has some changes that are incompatible with previous versions of Logstash. This section discusses
what you need to be aware of when migrating to this version.
[float]
== Elasticsearch Output Default
Starting with the 2.0 release of Logstash, the default Logstash output for Elasticsearch is HTTP. To use the `node` or
`transport` protocols, download the https://www.elastic.co/guide/en/logstash/2.0/plugins-outputs-elasticsearch_java.html[Elasticsearch Java plugin]. The
`transport` protocols, download the https://www.elastic.co/guide/en/logstash/2.0/plugins-outputs-elasticsearch_java.html[Elasticsearch Java plugin]. The
Logstash HTTP output to Elasticsearch now supports sniffing.
NOTE: The `elasticsearch_java` plugin has two versions specific to the version of the underlying Elasticsearch cluster.
NOTE: The `elasticsearch_java` plugin has two versions specific to the version of the underlying Elasticsearch cluster.
Be sure to specify the correct value for the `--version` option during installation:
* For Elasticsearch versions before 2.0, use the command
* For Elasticsearch versions before 2.0, use the command
`bin/plugin install --version 1.5.x logstash-output-elasticsearch_java`
* For Elasticsearch versions 2.0 and after, use the command
* For Elasticsearch versions 2.0 and after, use the command
`bin/plugin install --version 2.0.0 logstash-output-elasticsearch_java`
[float]
@ -23,7 +23,7 @@ Be sure to specify the correct value for the `--version` option during installat
The Elasticsearch output plugin configuration has the following changes:
* The `host` configuration option is now `hosts`, allowing you to specify multiple hosts and associated ports in the
* The `host` configuration option is now `hosts`, allowing you to specify multiple hosts and associated ports in the
`myhost:9200` format
* New options: `bind_host`, `bind_port`, `cluster`, `embedded`, `embedded_http_port`, `port`, `sniffing_delay`
* The `max_inflight_requests` option, which was deprecated in the 1.5 release, is now removed
@ -42,22 +42,22 @@ Configuration files with these settings present are invalid and prevent Logstash
=== Kafka Output Configuration Changes
The 2.0 release of Logstash includes a new version of the Kafka output plugin with significant configuration changes.
Please compare the documentation pages for the
Please compare the documentation pages for the
https://www.elastic.co/guide/en/logstash/1.5/plugins-outputs-kafka.html[Logstash 1.5] and
https://www.elastic.co/guide/en/logstash/2.0/plugins-outputs-kafka.html[Logstash 2.0] versions of the Kafka output plugin
https://www.elastic.co/guide/en/logstash/2.0/plugins-outputs-kafka.html[Logstash 2.0] versions of the Kafka output plugin
and update your configuration files accordingly.
[float]
=== Metrics Filter Changes
Prior implementations of the metrics filter plugin used dotted field names. Elasticsearch does not allow field names to
have dots, beginning with version 2.0, so a change was made to use sub-fields instead of dots in this plugin. Please note
Prior implementations of the metrics filter plugin used dotted field names. Elasticsearch does not allow field names to
have dots, beginning with version 2.0, so a change was made to use sub-fields instead of dots in this plugin. Please note
that these changes make version 3.0.0 of the metrics filter plugin incompatible with previous releases.
[float]
=== Filter Worker Default Change
Starting with the 2.0 release of Logstash, the default value of the `filter_workers` configuration option for filter
plugins is half of the available CPU cores, instead of 1. This change increases parallelism in filter execution for
resource-intensive filtering operations. You can continue to use the `-w` flag to manually set the value for this option,
Starting with the 2.0 release of Logstash, the default value of the `filter_workers` configuration option for filter
plugins is half of the available CPU cores, instead of 1. This change increases parallelism in filter execution for
resource-intensive filtering operations. You can continue to use the `-w` flag to manually set the value for this option,
as in previous releases.

View file

@ -37,14 +37,14 @@ Logstash has the following flags. You can use the `--help` flag to display this
and NAME is the name of the plugin.
-t, --configtest
Checks configuration and then exit. Note that grok patterns are not checked for
correctness with this flag.
Logstash can read multiple config files from a directory. If you combine this
Checks configuration and then exit. Note that grok patterns are not checked for
correctness with this flag.
Logstash can read multiple config files from a directory. If you combine this
flag with `--debug`, Logstash will log the combined config file, annotating the
individual config blocks with the source file it came from.
-h, --help
Print help
Print help
-v
*DEPRECATED: see --verbose/debug* Increase verbosity. There are multiple levels

View file

@ -27,10 +27,10 @@ for plugin shutdown: `stop`, `stop?`, and `close`.
* Call the `stop` method from outside the plugin thread. This method signals the plugin to stop.
* The `stop?` method returns `true` when the `stop` method has already been called for that plugin.
* The `close` method performs final bookkeeping and cleanup after the plugin's `run` method and the plugin's thread both
* The `close` method performs final bookkeeping and cleanup after the plugin's `run` method and the plugin's thread both
exit. The `close` method is a a new name for the method known as `teardown` in previous versions of Logstash.
The `shutdown`, `finished`, `finished?`, `running?`, and `terminating?` methods are redundant and no longer present in the
The `shutdown`, `finished`, `finished?`, `running?`, and `terminating?` methods are redundant and no longer present in the
Plugin Base class.
Sample code for the new plugin shutdown APIs is https://github.com/logstash-plugins/logstash-input-example/blob/master/lib/logstash/inputs/example.rb[available].

View file

@ -1,20 +1,20 @@
[[deploying-and-scaling]]
=== Deploying and Scaling Logstash
As your use case for Logstash evolves, the preferred architecture at a given scale will change. This section discusses
a range of Logstash architectures in increasing order of complexity, starting from a minimal installation and adding
elements to the system. The example deployments in this section write to an Elasticsearch cluster, but Logstash can
As your use case for Logstash evolves, the preferred architecture at a given scale will change. This section discusses
a range of Logstash architectures in increasing order of complexity, starting from a minimal installation and adding
elements to the system. The example deployments in this section write to an Elasticsearch cluster, but Logstash can
write to a large variety of {logstash}output-plugins.html[endpoints].
[float]
[[deploying-minimal-install]]
==== The Minimal Installation
The minimal Logstash installation has one Logstash instance and one Elasticsearch instance. These instances are
directly connected. Logstash uses an {logstash}input-plugins.html[_input plugin_] to ingest data and an
Elasticsearch {logstash}output-plugins.html[_output plugin_] to index the data in Elasticsearch, following the Logstash
{logstash}pipeline.html[_processing pipeline_]. A Logstash instance has a fixed pipeline constructed at startup,
based on the instances configuration file. You must specify an input plugin. Output defaults to `stdout`, and the
The minimal Logstash installation has one Logstash instance and one Elasticsearch instance. These instances are
directly connected. Logstash uses an {logstash}input-plugins.html[_input plugin_] to ingest data and an
Elasticsearch {logstash}output-plugins.html[_output plugin_] to index the data in Elasticsearch, following the Logstash
{logstash}pipeline.html[_processing pipeline_]. A Logstash instance has a fixed pipeline constructed at startup,
based on the instances configuration file. You must specify an input plugin. Output defaults to `stdout`, and the
filtering section of the pipeline, which is discussed in the next section, is optional.
image::static/images/deploy_1.png[]
@ -23,17 +23,17 @@ image::static/images/deploy_1.png[]
[[deploying-filter-threads]]
==== Using Filters
Log data is typically unstructured, often contains extraneous information that isnt relevant to your use case, and
sometimes is missing relevant information that can be derived from the log contents. You can use a
{logstash}filter-plugins.html[filter plugin] to parse the log into fields, remove unnecessary information, and derive
additional information from the existing fields. For example, filters can derive geolocation information from an IP
address and add that information to the logs, or parse and structure arbitrary text with the
Log data is typically unstructured, often contains extraneous information that isnt relevant to your use case, and
sometimes is missing relevant information that can be derived from the log contents. You can use a
{logstash}filter-plugins.html[filter plugin] to parse the log into fields, remove unnecessary information, and derive
additional information from the existing fields. For example, filters can derive geolocation information from an IP
address and add that information to the logs, or parse and structure arbitrary text with the
{logstash}plugins-filters-grok.html[grok] filter.
Adding a filter plugin can significantly affect performance, depending on the amount of computation the filter plugin
performs, as well as on the volume of the logs being processed. The `grok` filters regular expression computation is
particularly resource-intensive. One way to address this increased demand for computing resources is to use
parallel processing on multicore machines. Use the `-w` switch to set the number of execution threads for Logstash
Adding a filter plugin can significantly affect performance, depending on the amount of computation the filter plugin
performs, as well as on the volume of the logs being processed. The `grok` filters regular expression computation is
particularly resource-intensive. One way to address this increased demand for computing resources is to use
parallel processing on multicore machines. Use the `-w` switch to set the number of execution threads for Logstash
filtering tasks. For example the `bin/logstash -w 8` command uses eight different threads for filter processing.
image::static/images/deploy_2.png[]
@ -43,9 +43,9 @@ image::static/images/deploy_2.png[]
==== Using Filebeat
https://www.elastic.co/guide/en/beats/filebeat/current/index.html[Filebeat] is a lightweight, resource-friendly tool
written in Go that collects logs from files on the server and forwards these logs to other machines for processing.
Filebeat uses the https://www.elastic.co/guide/en/beats/libbeat/current/index.html[Beats] protocol to communicate with a
centralized Logstash instance. Configure the Logstash instances that receive Beats data to use the
written in Go that collects logs from files on the server and forwards these logs to other machines for processing.
Filebeat uses the https://www.elastic.co/guide/en/beats/libbeat/current/index.html[Beats] protocol to communicate with a
centralized Logstash instance. Configure the Logstash instances that receive Beats data to use the
{logstash}plugins-inputs-beats.html[Beats input plugin].
Filebeat uses the computing resources of the machine hosting the source data, and the Beats input plugin minimizes the
@ -57,33 +57,33 @@ image::static/images/deploy_3.png[]
[[deploying-larger-cluster]]
==== Scaling to a Larger Elasticsearch Cluster
Typically, Logstash does not communicate with a single Elasticsearch node, but with a cluster that comprises several
Typically, Logstash does not communicate with a single Elasticsearch node, but with a cluster that comprises several
nodes. By default, Logstash uses the HTTP protocol to move data into the cluster.
You can use the Elasticsearch HTTP REST APIs to index data into the Elasticsearch cluster. These APIs represent the
indexed data in JSON. Using the REST APIs does not require the Java client classes or any additional JAR
files and has no performance disadvantages compared to the transport or node protocols. You can secure communications
You can use the Elasticsearch HTTP REST APIs to index data into the Elasticsearch cluster. These APIs represent the
indexed data in JSON. Using the REST APIs does not require the Java client classes or any additional JAR
files and has no performance disadvantages compared to the transport or node protocols. You can secure communications
that use the HTTP REST APIs with the {shield}[Shield] plugin, which supports SSL and HTTP basic authentication.
When you use the HTTP protocol, you can configure the Logstash Elasticsearch output plugin to automatically
load-balance indexing requests across a
When you use the HTTP protocol, you can configure the Logstash Elasticsearch output plugin to automatically
load-balance indexing requests across a
specified set of hosts in the Elasticsearch cluster. Specifying multiple Elasticsearch nodes also provides high availability for the Elasticsearch cluster by routing traffic to active Elasticsearch nodes.
You can also use the Elasticsearch Java APIs to serialize the data into a binary representation, using
the transport protocol. The transport protocol can sniff the endpoint of the request and select an
arbitrary client or data node in the Elasticsearch cluster.
You can also use the Elasticsearch Java APIs to serialize the data into a binary representation, using
the transport protocol. The transport protocol can sniff the endpoint of the request and select an
arbitrary client or data node in the Elasticsearch cluster.
Using the HTTP or transport protocols keep your Logstash instances separate from the Elasticsearch cluster. The node
protocol, by contrast, has the machine running the Logstash instance join the Elasticsearch cluster, running an
Elasticsearch instance. The data that needs indexing propagates from this node to the rest of the cluster. Since the
machine is part of the cluster, the cluster topology is available, making the node protocol a good fit for use cases
Using the HTTP or transport protocols keep your Logstash instances separate from the Elasticsearch cluster. The node
protocol, by contrast, has the machine running the Logstash instance join the Elasticsearch cluster, running an
Elasticsearch instance. The data that needs indexing propagates from this node to the rest of the cluster. Since the
machine is part of the cluster, the cluster topology is available, making the node protocol a good fit for use cases
that use a relatively small number of persistent connections.
You can also use a third-party hardware or software load balancer to handle connections between Logstash and
You can also use a third-party hardware or software load balancer to handle connections between Logstash and
external applications.
NOTE: Make sure that your Logstash configuration does not connect directly to Elasticsearch dedicated
{ref}modules-node.html[master nodes], which perform dedicated cluster management. Connect Logstash to client or data
{ref}modules-node.html[master nodes], which perform dedicated cluster management. Connect Logstash to client or data
nodes to protect the stability of your Elasticsearch cluster.
image::static/images/deploy_4.png[]
@ -92,19 +92,19 @@ image::static/images/deploy_4.png[]
[[deploying-message-queueing]]
==== Managing Throughput Spikes with Message Queueing
When the data coming into a Logstash pipeline exceeds the Elasticsearch cluster's ability to ingest the data, you can
use a message queue as a buffer. By default, Logstash throttles incoming events when
indexer consumption rates fall below incoming data rates. Since this throttling can lead to events being buffered at
When the data coming into a Logstash pipeline exceeds the Elasticsearch cluster's ability to ingest the data, you can
use a message queue as a buffer. By default, Logstash throttles incoming events when
indexer consumption rates fall below incoming data rates. Since this throttling can lead to events being buffered at
the data source, preventing backpressure with message queues becomes an important part of managing your deployment.
Adding a message queue to your Logstash deployment also provides a level of protection from data loss. When a Logstash
instance that has consumed data from the message queue fails, the data can be replayed from the message queue to an
Adding a message queue to your Logstash deployment also provides a level of protection from data loss. When a Logstash
instance that has consumed data from the message queue fails, the data can be replayed from the message queue to an
active Logstash instance.
Several third-party message queues exist, such as Redis, Kafka, or RabbitMQ. Logstash provides input and output plugins
to integrate with several of these third-party message queues. When your Logstash deployment has a message queue
configured, Logstash functionally exists in two phases: shipping instances, which handles data ingestion and storage in
the message queue, and indexing instances, which retrieve the data from the message queue, apply any configured
Several third-party message queues exist, such as Redis, Kafka, or RabbitMQ. Logstash provides input and output plugins
to integrate with several of these third-party message queues. When your Logstash deployment has a message queue
configured, Logstash functionally exists in two phases: shipping instances, which handles data ingestion and storage in
the message queue, and indexing instances, which retrieve the data from the message queue, apply any configured
filtering, and write the filtered data to an Elasticsearch index.
image::static/images/deploy_5.png[]
@ -113,20 +113,20 @@ image::static/images/deploy_5.png[]
[[deploying-logstash-ha]]
==== Multiple Connections for Logstash High Availability
To make your Logstash deployment more resilient to individual instance failures, you can set up a load balancer between
your data source machines and the Logstash cluster. The load balancer handles the individual connections to the
To make your Logstash deployment more resilient to individual instance failures, you can set up a load balancer between
your data source machines and the Logstash cluster. The load balancer handles the individual connections to the
Logstash instances to ensure continuity of data ingestion and processing even when an individual instance is unavailable.
image::static/images/deploy_6.png[]
The architecture in the previous diagram is unable to process input from a specific type, such as an RSS feed or a
file, if the Logstash instance dedicated to that input type becomes unavailable. For more robust input processing,
The architecture in the previous diagram is unable to process input from a specific type, such as an RSS feed or a
file, if the Logstash instance dedicated to that input type becomes unavailable. For more robust input processing,
configure each Logstash instance for multiple inputs, as in the following diagram:
image::static/images/deploy_7.png[]
This architecture parallelizes the Logstash workload based on the inputs you configure. With more inputs, you can add
more Logstash instances to scale horizontally. Separate parallel pipelines also increases the reliability of your stack
This architecture parallelizes the Logstash workload based on the inputs you configure. With more inputs, you can add
more Logstash instances to scale horizontally. Separate parallel pipelines also increases the reliability of your stack
by eliminating single points of failure.
[float]
@ -140,7 +140,7 @@ A mature Logstash deployment typically has the following pipeline:
* The _filter_ tier applies parsing and other processing to the data consumed from the message queue.
* The _indexing_ tier moves the processed data into Elasticsearch.
Any of these layers can be scaled by adding computing resources. Examine the performance of these components regularly
as your use case evolves and add resources as needed. When Logstash routinely throttles incoming events, consider
adding storage for your message queue. Alternately, increase the Elasticsearch cluster's rate of data consumption by
Any of these layers can be scaled by adding computing resources. Examine the performance of these components regularly
as your use case evolves and add resources as needed. When Logstash routinely throttles incoming events, consider
adding storage for your message queue. Alternately, increase the Elasticsearch cluster's rate of data consumption by
adding more Logstash indexing instances.

View file

@ -1,15 +1,15 @@
[[getting-started-with-logstash]]
== Getting Started with Logstash
This section guides you through the process of installing Logstash and verifying that everything is running properly.
This section guides you through the process of installing Logstash and verifying that everything is running properly.
Later sections deal with increasingly complex configurations to address selected use cases.
[float]
[[installing-logstash]]
=== Install Logstash
NOTE: Logstash requires Java 7 or later. Use the
http://www.oracle.com/technetwork/java/javase/downloads/index.html[official Oracle distribution] or an open-source
NOTE: Logstash requires Java 7 or later. Use the
http://www.oracle.com/technetwork/java/javase/downloads/index.html[official Oracle distribution] or an open-source
distribution such as http://openjdk.java.net/[OpenJDK].
To check your Java version, run the following command:
@ -28,8 +28,8 @@ Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
[[installing-binary]]
==== Installing from a downloaded binary
Download the https://www.elastic.co/downloads/logstash[Logstash installation file] that matches your host environment.
Unpack the file. On supported Linux operating systems, you can <<package-repositories,use a package manager>> to
Download the https://www.elastic.co/downloads/logstash[Logstash installation file] that matches your host environment.
Unpack the file. On supported Linux operating systems, you can <<package-repositories,use a package manager>> to
install Logstash.
[[first-event]]
@ -41,17 +41,17 @@ To test your Logstash installation, run the most basic Logstash pipeline:
cd logstash-{logstash_version}
bin/logstash -e 'input { stdin { } } output { stdout {} }'
The `-e` flag enables you to specify a configuration directly from the command line. Specifying configurations at the
The `-e` flag enables you to specify a configuration directly from the command line. Specifying configurations at the
command line lets you quickly test configurations without having to edit a file between iterations.
This pipeline takes input from the standard input, `stdin`, and moves that input to the standard output, `stdout`, in a
This pipeline takes input from the standard input, `stdin`, and moves that input to the standard output, `stdout`, in a
structured format. Type hello world at the command prompt to see Logstash respond:
[source,shell]
hello world
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a *CTRL-D* command in the
Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a *CTRL-D* command in the
shell where Logstash is running.
The <<advanced-pipeline,Advanced Tutorial>> expands the capabilities of your Logstash instance to cover broader
The <<advanced-pipeline,Advanced Tutorial>> expands the capabilities of your Logstash instance to cover broader
use cases.

View file

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 77 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 55 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 105 KiB

After

Width:  |  Height:  |  Size: 105 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 164 KiB

After

Width:  |  Height:  |  Size: 164 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 172 KiB

After

Width:  |  Height:  |  Size: 172 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 68 KiB

Before After
Before After

View file

@ -27,13 +27,13 @@ Collect more, so you can know more. Logstash welcomes data of all shapes and siz
Where it all started.
* Handle all types of logging data
** Easily ingest a multitude of web logs like <<parsing-into-es,Apache>>, and application
** Easily ingest a multitude of web logs like <<parsing-into-es,Apache>>, and application
logs like <<plugins-inputs-log4j,log4j>> for Java
** Capture many other log formats like <<plugins-inputs-syslog,syslog>>,
** Capture many other log formats like <<plugins-inputs-syslog,syslog>>,
<<plugins-inputs-eventlog,Windows event logs>>, networking and firewall logs, and more
* Enjoy complementary secure log forwarding capabilities with https://github.com/elastic/beats/tree/master/filebeat[Filebeat]
* Collect metrics from <<plugins-inputs-ganglia,Ganglia>>, <<plugins-codecs-collectd,collectd>>,
<<plugins-codecs-netflow,NetFlow>>, <<plugins-inputs-jmx,JMX>>, and many other infrastructure
* Collect metrics from <<plugins-inputs-ganglia,Ganglia>>, <<plugins-codecs-collectd,collectd>>,
<<plugins-codecs-netflow,NetFlow>>, <<plugins-inputs-jmx,JMX>>, and many other infrastructure
and application platforms over <<plugins-inputs-tcp,TCP>> and <<plugins-inputs-udp,UDP>>
[float]
@ -41,12 +41,12 @@ and application platforms over <<plugins-inputs-tcp,TCP>> and <<plugins-inputs-u
Unlock the World Wide Web.
* Transform <<plugins-inputs-http,HTTP requests>> into events
* Transform <<plugins-inputs-http,HTTP requests>> into events
(https://www.elastic.co/blog/introducing-logstash-input-http-plugin[blog])
** Consume from web service firehoses like <<plugins-inputs-twitter,Twitter>> for social sentiment analysis
** Webhook support for GitHub, HipChat, JIRA, and countless other applications
** Enables many https://www.elastic.co/guide/en/watcher/current/logstash-integration.html[Watcher] alerting use cases
* Create events by polling <<plugins-inputs-http_poller,HTTP endpoints>> on demand
* Create events by polling <<plugins-inputs-http_poller,HTTP endpoints>> on demand
(https://www.elastic.co/blog/introducing-logstash-http-poller[blog])
** Universally capture health, performance, metrics, and other types of data from web application interfaces
** Perfect for scenarios where the control of polling is preferred over receiving
@ -56,10 +56,10 @@ Unlock the World Wide Web.
Discover more value from the data you already own.
* Better understand your data from any relational database or NoSQL store with a
* Better understand your data from any relational database or NoSQL store with a
<<plugins-inputs-jdbc,JDBC>> interface (https://www.elastic.co/blog/logstash-jdbc-input-plugin[blog])
* Unify diverse data streams from messaging queues like Apache <<plugins-outputs-kafka,Kafka>>
(https://www.elastic.co/blog/logstash-kafka-intro[blog]), <<plugins-outputs-rabbitmq,RabbitMQ>>,
* Unify diverse data streams from messaging queues like Apache <<plugins-outputs-kafka,Kafka>>
(https://www.elastic.co/blog/logstash-kafka-intro[blog]), <<plugins-outputs-rabbitmq,RabbitMQ>>,
<<plugins-outputs-sqs,Amazon SQS>>, and <<plugins-outputs-zeromq,ZeroMQ>>
[float]
@ -67,41 +67,41 @@ Discover more value from the data you already own.
Explore an expansive breadth of other data.
* In this age of technological advancement, the massive IoT world unleashes endless use cases through capturing and
* In this age of technological advancement, the massive IoT world unleashes endless use cases through capturing and
harnessing data from connected sensors.
* Logstash is the common event collection backbone for ingestion of data shipped from mobile devices to intelligent
* Logstash is the common event collection backbone for ingestion of data shipped from mobile devices to intelligent
homes, connected vehicles, healthcare sensors, and many other industry specific applications.
* https://www.elastic.co/elasticon/2015/sf/if-it-moves-measure-it-logging-iot-with-elk[Watch] as Logstash, in
conjunction with the broader ELK stack, centralizes and enriches sensor data to gain deeper knowledge regarding a
* https://www.elastic.co/elasticon/2015/sf/if-it-moves-measure-it-logging-iot-with-elk[Watch] as Logstash, in
conjunction with the broader ELK stack, centralizes and enriches sensor data to gain deeper knowledge regarding a
residential home.
[float]
== Easily Enrich Everything
The better the data, the better the knowledge. Clean and transform your data during ingestion to gain near real-time
insights immediately at index or output time. Logstash comes out-of-box with many aggregations and mutations along
The better the data, the better the knowledge. Clean and transform your data during ingestion to gain near real-time
insights immediately at index or output time. Logstash comes out-of-box with many aggregations and mutations along
with pattern matching, geo mapping, and dynamic lookup capabilities.
* <<plugins-filters-grok,Grok>> is the bread and butter of Logstash filters and is used ubiquitously to derive
structure out of unstructured data. Enjoy a wealth of integrated patterns aimed to help quickly resolve web, systems,
* <<plugins-filters-grok,Grok>> is the bread and butter of Logstash filters and is used ubiquitously to derive
structure out of unstructured data. Enjoy a wealth of integrated patterns aimed to help quickly resolve web, systems,
networking, and other types of event formats.
* Expand your horizons by deciphering <<plugins-filters-geoip,geo coordinates>> from IP addresses, normalizing
<<plugins-filters-date,date>> complexity, simplifying <<plugins-filters-kv,key-value pairs>> and
<<plugins-filters-csv,CSV>> data, <<plugins-filters-anonymize,anonymizing>> sensitive information, and further
enriching your data with <<plugins-filters-translate,local lookups>> or Elasticsearch
* Expand your horizons by deciphering <<plugins-filters-geoip,geo coordinates>> from IP addresses, normalizing
<<plugins-filters-date,date>> complexity, simplifying <<plugins-filters-kv,key-value pairs>> and
<<plugins-filters-csv,CSV>> data, <<plugins-filters-anonymize,anonymizing>> sensitive information, and further
enriching your data with <<plugins-filters-translate,local lookups>> or Elasticsearch
<<plugins-filters-elasticsearch,queries>>.
* Codecs are often used to ease the processing of common event structures like <<plugins-codecs-json,JSON>>
* Codecs are often used to ease the processing of common event structures like <<plugins-codecs-json,JSON>>
and <<plugins-codecs-multiline,multiline>> events.
[float]
== Choose Your Stash
Route your data where it matters most. Unlock various downstream analytical and operational use cases by storing,
Route your data where it matters most. Unlock various downstream analytical and operational use cases by storing,
analyzing, and taking action on your data.
[cols="a,a"]
|=======================================================================
|
|
*Analysis*
@ -116,9 +116,9 @@ analyzing, and taking action on your data.
* <<plugins-outputs-s3,S3>>
* <<plugins-outputs-google_cloud_storage,Google Cloud Storage>>
|
|
*Monitoring*
*Monitoring*
* <<plugins-outputs-nagios,Nagios>>
* <<plugins-outputs-ganglia,Ganglia>>
@ -127,7 +127,7 @@ analyzing, and taking action on your data.
* <<plugins-outputs-datadog,Datadog>>
* <<plugins-outputs-cloudwatch,CloudWatch>>
|
|
*Alerting*

View file

@ -1,4 +1,4 @@
== Glossary
== Glossary
Logstash Glossary
apache ::
@ -9,7 +9,7 @@ agent ::
broker ::
An intermediary used in a multi-tiered Logstash deployment which allows a queueing mechanism to be used. Examples of brokers are Redis, RabbitMQ, and Apache Kafka. This pattern is a common method of building fault-tolerance into a Logstash architecture.
An intermediary used in a multi-tiered Logstash deployment which allows a queueing mechanism to be used. Examples of brokers are Redis, RabbitMQ, and Apache Kafka. This pattern is a common method of building fault-tolerance into a Logstash architecture.
buffer::
Within Logstash, a temporary storage area where events can queue up, waiting to be processed. The default queue size is 20 events, but it is not recommended to increase this, as Logstash is not designed to operate as a queueing mechanism.
@ -27,7 +27,7 @@ conditional::
In a computer programming context, a control flow which executes certain actions based on true/false values of a statement (called the condition). Often expressed in the form of "if ... then ... (elseif ...) else". Logstash has built-in conditionals to allow users control of the plugin pipeline.
elasticsearch::
An open-source, Lucene-based, RESTful search and analytics engine written in Java, with supported clients in various languages such as Perl, Python, Ruby, Java, etc.
An open-source, Lucene-based, RESTful search and analytics engine written in Java, with supported clients in various languages such as Perl, Python, Ruby, Java, etc.
event::
In Logstash parlance, a single unit of information, containing a timestamp plus additional data. An event arrives via an input, and is subsequently parsed, timestamped, and passed through the Logstash pipeline.
@ -39,7 +39,7 @@ file::
A resource storing binary data (which might be text, image, application, etc.) on a physical storage media. In the Logstash context, a common input source which monitors a growing collection of text-based log lines.
filter:
An intermediary processing mechanism in the Lostash pipeline. Typically, filters act upon event data after it has been ingested via inputs, by mutating, enriching, and/or modifying the data according to configuration rules. The second phase of the typical Logstash pipeline (inputs->filters->outputs).
An intermediary processing mechanism in the Lostash pipeline. Typically, filters act upon event data after it has been ingested via inputs, by mutating, enriching, and/or modifying the data according to configuration rules. The second phase of the typical Logstash pipeline (inputs->filters->outputs).
fluentd::
Like Logstash, another open-source tool for collecting logs and events, with plugins to extend functionality.
@ -60,7 +60,7 @@ indexer::
Refers to a Logstash instance which is tasked with interfacing with an Elasticsearch cluster in order to index event data.
input::
The means for ingesting data into Logstash. Inputs allow users to pull data from files, network sockets, other applications, etc. The initial phase of the typical Logstash pipeline (inputs->filters->outputs).
The means for ingesting data into Logstash. Inputs allow users to pull data from files, network sockets, other applications, etc. The initial phase of the typical Logstash pipeline (inputs->filters->outputs).
jar / jarfile::
A packaging method for Java libraries. Since Logstash runs on the JRuby runtime environment, it is possible to use these Java libraries to provide extra functionality to Logstash.
@ -69,7 +69,7 @@ java::
An object-oriented programming language popular for its flexibility, extendability and portability.
jRuby:
JRuby is a 100% Java implementation of the Ruby programming language, which allows Ruby to run in the JVM. Logstash typically runs in JRuby, which provides it with a fast, extensible runtime environment.
JRuby is a 100% Java implementation of the Ruby programming language, which allows Ruby to run in the JVM. Logstash typically runs in JRuby, which provides it with a fast, extensible runtime environment.
kibana::
A visual tool for viewing time-based data which has been stored in Elasticsearch. Kibana features a powerful set of functionality based on panels which query Elasticsearch in different ways.
@ -87,7 +87,7 @@ lumberjack::
A protocol for shipping logs from one location to another, in a secure and optimized manner. Also the (deprecated) name of a software application, now known as Logstash Forwarder (LSF).
output::
The means for passing event data out of Logstash into other applications, network endpoints, files, etc. The last phase of the typical Logstash pipeline (inputs->filters->outputs).
The means for passing event data out of Logstash into other applications, network endpoints, files, etc. The last phase of the typical Logstash pipeline (inputs->filters->outputs).
pipeline::
A term used to describe the flow of events through the Logstash workflow. The pipeline typically consists of a series of inputs, filters, and outputs.
@ -129,4 +129,4 @@ type::
In Elasticsearch type, a type can be compared to a table in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed. To index documents, it is required to specify both an index and a type.
worker::
The filter thread model used by Logstash, where each worker receives an event and applies all filters, in order, before emitting the event to the output queue. This allows scalability across CPUs because many filters are CPU intensive (permitting that we have thread safety).
The filter thread model used by Logstash, where each worker receives an event and applies all filters, in order, before emitting the event to the output queue. This allows scalability across CPUs because many filters are CPU intensive (permitting that we have thread safety).

View file

@ -1,13 +1,13 @@
[[community-maintainer]]
== Logstash Plugins Community Maintainer Guide
This document, to be read by new Maintainers, should explain their responsibilities. It was inspired by the
http://rfc.zeromq.org/spec:22[C4] document from the ZeroMQ project. This document is subject to change and suggestions
This document, to be read by new Maintainers, should explain their responsibilities. It was inspired by the
http://rfc.zeromq.org/spec:22[C4] document from the ZeroMQ project. This document is subject to change and suggestions
through Pull Requests and issues are strongly encouraged.
=== Contribution Guidelines
For general guidance around contributing to Logstash Plugins, see the
For general guidance around contributing to Logstash Plugins, see the
https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html[_Contributing to Logstash_] section.
=== Document Goals
@ -16,58 +16,58 @@ To help make the Logstash plugins community participation easy with positive fe
To increase diversity.
To reduce code review, merge and release dependencies on the core team by providing support and tools to the Community and
To reduce code review, merge and release dependencies on the core team by providing support and tools to the Community and
Maintainers.
To support the natural life cycle of a plugin.
To codify the roles and responsibilities of: Maintainers and Contributors with specific focus on patch testing, code
To codify the roles and responsibilities of: Maintainers and Contributors with specific focus on patch testing, code
review, merging and release.
=== Development Workflow
All Issues and Pull Requests must be tracked using the Github issue tracker.
The plugin uses the http://www.apache.org/licenses/LICENSE-2.0[Apache 2.0 license]. Maintainers should check whether a
patch introduces code which has an incompatible license. Patch ownership and copyright is defined in the Elastic
The plugin uses the http://www.apache.org/licenses/LICENSE-2.0[Apache 2.0 license]. Maintainers should check whether a
patch introduces code which has an incompatible license. Patch ownership and copyright is defined in the Elastic
https://www.elastic.co/contributor-agreement[Contributor License Agreement] (CLA).
==== Terminology
A "Contributor" is a role a person assumes when providing a patch. Contributors will not have commit access to the
repository. They need to sign the Elastic https://www.elastic.co/contributor-agreement[Contributor License Agreement]
A "Contributor" is a role a person assumes when providing a patch. Contributors will not have commit access to the
repository. They need to sign the Elastic https://www.elastic.co/contributor-agreement[Contributor License Agreement]
before a patch can be reviewed. Contributors can add themselves to the plugin Contributor list.
A "Maintainer" is a role a person assumes when maintaining a plugin and keeping it healthy, including triaging issues, and
A "Maintainer" is a role a person assumes when maintaining a plugin and keeping it healthy, including triaging issues, and
reviewing and merging patches.
==== Patch Requirements
A patch is a minimal and accurate answer to exactly one identified and agreed upon problem. It must conform to the code
A patch is a minimal and accurate answer to exactly one identified and agreed upon problem. It must conform to the code
style guidelines and must include RSpec tests that verify the fitness of the solution.
A patch will be automatically tested by a CI system that will report on the Pull Request status.
A patch CLA will be automatically verified and reported on the Pull Request status.
A patch commit message has a single short (less than 50 character) first line summarizing the change, a blank second line,
A patch commit message has a single short (less than 50 character) first line summarizing the change, a blank second line,
and any additional lines as necessary for change explanation and rationale.
A patch is mergeable when it satisfies the above requirements and has been reviewed positively by at least one other
A patch is mergeable when it satisfies the above requirements and has been reviewed positively by at least one other
person.
==== Development Process
A user will log an issue on the issue tracker describing the problem they face or observe with as much detail as possible.
To work on an issue, a Contributor forks the plugin repository and then works on their forked repository and submits a
To work on an issue, a Contributor forks the plugin repository and then works on their forked repository and submits a
patch by creating a pull request back to the plugin.
Maintainers must not merge patches where the author has not signed the CLA.
Maintainers must not merge patches where the author has not signed the CLA.
Before a patch can be accepted it should be reviewed. Maintainers should merge accepted patches without delay.
Maintainers should not merge their own patches except in exceptional cases, such as non-responsiveness from other
Maintainers should not merge their own patches except in exceptional cases, such as non-responsiveness from other
Maintainers or core team for an extended period (more than 2 weeks).
Reviewers comments should not be based on personal preferences.
@ -80,42 +80,42 @@ Review non-source changes such as documentation in the same way as source code c
==== Branch Management
The plugin has a master branch that always holds the latest in-progress version and should always build. Topic branches
The plugin has a master branch that always holds the latest in-progress version and should always build. Topic branches
should kept to the minimum.
=== Versioning Plugins
Logstash core and its plugins have separate product development lifecycles. Hence the versioning and release strategy for
the core and plugins do not have to be aligned. In fact, this was one of our goals during the great separation of plugins
work in Logstash 1.5.
Logstash core and its plugins have separate product development lifecycles. Hence the versioning and release strategy for
the core and plugins do not have to be aligned. In fact, this was one of our goals during the great separation of plugins
work in Logstash 1.5.
At times, there will be changes in core API in Logstash, which will require mass update of plugins to reflect the changes
in core. However, this does not happen frequently.
At times, there will be changes in core API in Logstash, which will require mass update of plugins to reflect the changes
in core. However, this does not happen frequently.
For plugins, we would like to adhere to a versioning and release strategy that can better inform our users, about any
For plugins, we would like to adhere to a versioning and release strategy that can better inform our users, about any
breaking changes to the Logstash configuration formats and functionality.
Plugin releases follows a three-placed numbering scheme X.Y.Z. where X denotes a major release version which may break
compatibility with existing configuration or functionality. Y denotes releases which includes features which are backward
compatible. Z denotes releases which includes bug fixes and patches.
Plugin releases follows a three-placed numbering scheme X.Y.Z. where X denotes a major release version which may break
compatibility with existing configuration or functionality. Y denotes releases which includes features which are backward
compatible. Z denotes releases which includes bug fixes and patches.
==== Changing the version
Version can be changed in the Gemspec, which needs to be associated with a changelog entry. Following this, we can publish
Version can be changed in the Gemspec, which needs to be associated with a changelog entry. Following this, we can publish
the gem to RubyGem.org manually. At this point only the core developers can publish a gem.
==== Labeling
Labeling is a critical aspect of maintaining plugins. All issues in GitHub should be labeled correctly so it can:
Labeling is a critical aspect of maintaining plugins. All issues in GitHub should be labeled correctly so it can:
* Provide good feedback to users/developers
* Help prioritize changes
* Provide good feedback to users/developers
* Help prioritize changes
* Be used in release notes
Most labels are self explanatory, but heres a quick recap of few important labels:
* `bug`: Labels an issue as an unintentional defect
* `needs details`: If a the issue reporter has incomplete details, please ask them for more info and label as needs
* `needs details`: If a the issue reporter has incomplete details, please ask them for more info and label as needs
details.
* `missing cla`: Contributor License Agreement is missing and patch cannot be accepted without it
* `adopt me`: Ask for help from the community to take over this issue
@ -125,8 +125,8 @@ details.
=== Logging
Although its important not to bog down performance with excessive logging, debug level logs can be immensely helpful when
diagnosing and troubleshooting issues with Logstash. Please remember to liberally add debug logs wherever it makes sense
Although its important not to bog down performance with excessive logging, debug level logs can be immensely helpful when
diagnosing and troubleshooting issues with Logstash. Please remember to liberally add debug logs wherever it makes sense
as users will be forever gracious.
[source,shell]
@ -136,13 +136,13 @@ as users will be forever gracious.
[qanda]
Why is a https://www.elastic.co/contributor-agreement[CLA] required?::
We ask this of all Contributors in order to assure our users of the origin and continuing existence of the code. We
are not asking Contributors to assign copyright to us, but to give us the right to distribute a Contributors code
We ask this of all Contributors in order to assure our users of the origin and continuing existence of the code. We
are not asking Contributors to assign copyright to us, but to give us the right to distribute a Contributors code
without restriction.
Please make sure the CLA is signed by every Contributor prior to reviewing PRs and commits.::
Contributors only need to sign the CLA once and should sign with the same email as used in Github. If a Contributor
signs the CLA after a PR is submitted, they can refresh the automated CLA checker by pushing another
signs the CLA after a PR is submitted, they can refresh the automated CLA checker by pushing another
comment on the PR after 5 minutes of signing.
=== Community Administration
@ -151,5 +151,5 @@ The core team is there to support the plugin Maintainers and overall ecosystem.
Maintainers should propose Contributors to become a Maintainer.
Contributors and Maintainers should follow the Elastic Community https://www.elastic.co/community/codeofconduct[Code of
Contributors and Maintainers should follow the Elastic Community https://www.elastic.co/community/codeofconduct[Code of
Conduct]. The core team should block or ban "bad actors".

View file

@ -1,32 +1,32 @@
[[multiline]]
=== Managing Multiline Events
Several use cases generate events that span multiple lines of text. In order to correctly handle these multline events,
Several use cases generate events that span multiple lines of text. In order to correctly handle these multline events,
Logstash needs to know how to tell which lines are part of a single event.
Multiline event processing is complex and relies on proper event ordering. The best way to guarantee ordered log
processing is to implement the processing as early in the pipeline as possible. The preferred tool in the Logstash
pipeline is the {logstash}plugins-codecs-multiline.html[multiline codec], which merges lines from a single input using
Multiline event processing is complex and relies on proper event ordering. The best way to guarantee ordered log
processing is to implement the processing as early in the pipeline as possible. The preferred tool in the Logstash
pipeline is the {logstash}plugins-codecs-multiline.html[multiline codec], which merges lines from a single input using
a simple set of rules.
The most important aspects of configuring either multiline plugin are the following:
* The `pattern` option specifies a regular expression. Lines that match the specified regular expression are considered
either continuations of a previous line or the start of a new multiline event. You can use
* The `pattern` option specifies a regular expression. Lines that match the specified regular expression are considered
either continuations of a previous line or the start of a new multiline event. You can use
{logstash}plugins-filters-grok.html[grok] regular expression templates with this configuration option.
* The `what` option takes two values: `previous` or `next`. The `previous` value specifies that lines that match the
value in the `pattern` option are part of the previous line. The `next` value specifies that lines that match the value
in the `pattern` option are part of the following line.* The `negate` option applies the multiline codec to lines that
* The `what` option takes two values: `previous` or `next`. The `previous` value specifies that lines that match the
value in the `pattern` option are part of the previous line. The `next` value specifies that lines that match the value
in the `pattern` option are part of the following line.* The `negate` option applies the multiline codec to lines that
_do not_ match the regular expression specified in the `pattern` option.
See the full documentation for the {logstash}plugins-codecs-multiline.html[multiline codec] or the
See the full documentation for the {logstash}plugins-codecs-multiline.html[multiline codec] or the
{logstash}plugins-filters-multiline.html[multiline filter] plugin for more information on configuration options.
NOTE: For more complex needs, the {logstash}plugins-filters-multiline.html[multiline filter] performs a similar task at
NOTE: For more complex needs, the {logstash}plugins-filters-multiline.html[multiline filter] performs a similar task at
the filter stage of processing, where the Logstash instance aggregates multiple inputs.
The multiline filter plugin is not thread-safe. Avoid using multiple filter workers with the multiline filter. You can
track the progress of upgrades to the functionality of the multiline codec at
The multiline filter plugin is not thread-safe. Avoid using multiple filter workers with the multiline filter. You can
track the progress of upgrades to the functionality of the multiline codec at
https://github.com/logstash-plugins/logstash-codec-multiline/issues/10[this Github issue].
==== Examples of Multiline Plugin Configuration
@ -39,7 +39,7 @@ The examples in this section cover the following use cases:
===== Java Stack Traces
Java stack traces consist of multiple lines, with each line after the initial line beginning with whitespace, as in
Java stack traces consist of multiple lines, with each line after the initial line beginning with whitespace, as in
this example:
[source,java]
@ -64,7 +64,7 @@ This configuration merges any line that begins with whitespace up to the previou
===== Line Continuations
Several programming languages use the `\` character at the end of a line to denote that the line continues, as in this
Several programming languages use the `\` character at the end of a line to denote that the line continues, as in this
example:
[source,c]
@ -87,11 +87,11 @@ This configuration merges any line that ends with the `\` character with the fol
===== Timestamps
Activity logs from services such as Elasticsearch typically begin with a timestamp, followed by information on the
Activity logs from services such as Elasticsearch typically begin with a timestamp, followed by information on the
specific activity, as in this example:
[source,shell]
[2015-08-24 11:49:14,389][INFO ][env ] [Letha] using [1] data paths, mounts [[/
[2015-08-24 11:49:14,389][INFO ][env ] [Letha] using [1] data paths, mounts [[/
(/dev/disk1)]], net usable_space [34.5gb], net total_space [118.9gb], types [hfs]
To consolidate these lines into a single event in Logstash, use the following configuration for the multiline codec:
@ -108,5 +108,5 @@ input {
}
}
This configuration uses the `negate` option to specify that any line that does not begin with a timestamp belongs to
This configuration uses the `negate` option to specify that any line that does not begin with a timestamp belongs to
the previous line.

View file

@ -4,10 +4,10 @@
The Logstash <<working-with-plugins,plugin manager>> was introduced in the 1.5 release. This section discusses setting up
local repositories of plugins for use on systems without access to the Internet.
The procedures in this section require a staging machine running Logstash that has access to a public or private Rubygems
The procedures in this section require a staging machine running Logstash that has access to a public or private Rubygems
server. This staging machine downloads and packages the files used for offline installation.
See the <<private-rubygem,Private Gem Repositories>> section for information on setting up your own private
See the <<private-rubygem,Private Gem Repositories>> section for information on setting up your own private
Rubygems server.
Users who can work with a larger Logstash artifact size can use the *Logstash (All Plugins)* download link from the
@ -17,15 +17,15 @@ all available plugins. You can distribute this bundle to all nodes without furth
[float]
=== Building the Offline Package
Working with offline plugins requires you to create an _offline package_, which is a compressed file that contains all of
Working with offline plugins requires you to create an _offline package_, which is a compressed file that contains all of
the plugins your offline Logstash installation requires, along with the dependencies for those plugins.
. Create the offline package with the `bin/plugin pack` subcommand.
+
When you run the `bin/plugin pack` subcommand, Logstash creates a compressed bundle that contains all of the currently
installed plugins and the dependencies for those plugins. By default, the compressed bundle is a GZipped TAR file when you
run the `bin/plugin pack` subcommand on a UNIX machine. By default, when you run the `bin/plugin pack` subcommand on a
Windows machine, the compressed bundle is a ZIP file. See <<managing-packs,Managing Plugin Packs>> for details on changing
installed plugins and the dependencies for those plugins. By default, the compressed bundle is a GZipped TAR file when you
run the `bin/plugin pack` subcommand on a UNIX machine. By default, when you run the `bin/plugin pack` subcommand on a
Windows machine, the compressed bundle is a ZIP file. See <<managing-packs,Managing Plugin Packs>> for details on changing
these default behaviors.
+
NOTE: Downloading all dependencies for the specified plugins may take some time, depending on the plugins listed.
@ -36,7 +36,7 @@ NOTE: Downloading all dependencies for the specified plugins may take some time,
[float]
=== Install or Update a local plugin
To install or update a local plugin, use the `--local` option with the install and update commands, as in the following
To install or update a local plugin, use the `--local` option with the install and update commands, as in the following
examples:
.Installing a local plugin

View file

@ -1,15 +1,27 @@
[[working-with-plugins]]
== Working with plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Logstash has a rich collection of input, filter, codec and output plugins. Plugins are available as self-contained packages called gems and hosted on RubyGems.org. The plugin manager accesed via `bin/plugin` script is used to manage the lifecycle of plugins in your Logstash deployment. You can install, uninstall and upgrade plugins using these Command Line Interface (CLI) described below.
NOTE: Some sections here are for advanced users
=======
Logstash has a rich collection of input, filter, codec and output plugins. Plugins are available as self-contained
packages called gems and hosted on RubyGems.org. The plugin manager accesed via `bin/plugin` script is used to manage the
lifecycle of plugins in your Logstash deployment. You can install, uninstall and upgrade plugins using these Command Line
Interface (CLI) described below.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[float]
[[listing-plugins]]
=== Listing plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Logstash release packages bundle common plugins so you can use them out of the box. To list the plugins currently available in your deployment:
=======
Logstash release packages bundle common plugins so you can use them out of the box. To list the plugins currently
available in your deployment:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------
@ -30,7 +42,13 @@ bin/plugin list --group output <4>
[[installing-plugins]]
=== Adding plugins to your deployment
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
The most common situation when dealing with plugin installation is when you have access to internet. Using this method, you will be able to retrieve plugins hosted on the public repository (RubyGems.org) and install on top of your Logstash installation.
=======
The most common situation when dealing with plugin installation is when you have access to internet. Using this method,
you will be able to retrieve plugins hosted on the public repository (RubyGems.org) and install on top of your Logstash
installation.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------
@ -43,7 +61,12 @@ Once the plugin is successfully installed, you can start using it in your config
[float]
==== Advanced: Adding a locally built plugin
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
In some cases, you want to install plugins which have not yet been released and not hosted on RubyGems.org. Logstash provides you the option to install a locally built plugin which is packaged as a ruby gem. Using a file location:
=======
In some cases, you want to install plugins which have not yet been released and not hosted on RubyGems.org. Logstash
provides you the option to install a locally built plugin which is packaged as a ruby gem. Using a file location:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------
@ -54,7 +77,12 @@ bin/plugin install /path/to/logstash-output-kafka-1.0.0.gem
[float]
==== Advanced: Using `--pluginpath`
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Using the `--pluginpath` flag, you can load a plugin source code located on your file system. Typically this is used by developers who are iterating on a custom plugin and want to test it before creating a ruby gem.
=======
Using the `--pluginpath` flag, you can load a plugin source code located on your file system. Typically this is used by
developers who are iterating on a custom plugin and want to test it before creating a ruby gem.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------
@ -65,7 +93,12 @@ bin/logstash --pluginpath /opt/shared/lib/logstash/input/my-custom-plugin-code.r
[float]
=== Updating plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Plugins have their own release cycle and are often released independent of Logstashs core release cycle. Using the update sub-command you can get the latest or update to a particular version of the plugin.
=======
Plugins have their own release cycle and are often released independent of Logstashs core release cycle. Using the update
subcommand you can get the latest or update to a particular version of the plugin.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------
@ -91,7 +124,13 @@ bin/plugin uninstall logstash-output-kafka
[float]
=== Proxy Support
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
The previous sections relied on Logstash being able to communicate with RubyGems.org. In certain environments, Forwarding Proxy is used to handle HTTP requests. Logstash Plugins can be installed and updated through a Proxy by setting the `HTTP_PROXY` environment variable:
=======
The previous sections relied on Logstash being able to communicate with RubyGems.org. In certain environments, Forwarding
Proxy is used to handle HTTP requests. Logstash Plugins can be installed and updated through a Proxy by setting the
`HTTP_PROXY` environment variable:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell]
----------------------------------

53
docs/static/private-gem-repo.asciidoc vendored Normal file
View file

@ -0,0 +1,53 @@
[[private-rubygem]]
=== Private Gem Repositories
The Logstash plugin manager connects to a Ruby gems repository to install and update Logstash plugins. By default, this
repository is http://rubygems.org.
Some use cases are unable to use the default repository, as in the following examples:
* A firewall blocks access to the default repository.
* You are developing your own plugins locally.
* Airgap requirements on the local system.
When you use a custom gem repository, be sure to make plugin dependencies available.
Several open source projects enable you to run your own plugin server, among them:
* https://github.com/geminabox/geminabox[Geminabox]
* https://github.com/PierreRambaud/gemirro[Gemirro]
* https://gemfury.com/[Gemfury]
* http://www.jfrog.com/open-source/[Artifactory]
==== Editing the Gemfile
The gemfile is a configuration file that specifies information required for plugin management. Each gem file has a
`source` line that specifies a location for plugin content.
By default, the gemfile's `source` line reads:
[source,shell]
----------
# This is a Logstash generated Gemfile.
# If you modify this file manually all comments and formatting will be lost.
source "https://rubygems.org"
----------
To change the source, edit the `source` line to contain your preferred source, as in the following example:
[source,shell]
----------
# This is a Logstash generated Gemfile.
# If you modify this file manually all comments and formatting will be lost.
source "https://my.private.repository"
----------
After saving the new version of the gemfile, use <<working-with-plugins,plugin management commands>> normally.
The following links contain further material on setting up some commonly used repositories:
* https://github.com/geminabox/geminabox/blob/master/README.markdown[Geminabox]
* https://www.jfrog.com/confluence/display/RTF/RubyGems+Repositories[Artifactory]
* Running a http://guides.rubygems.org/run-your-own-gem-server/[rubygems mirror]

View file

@ -4,16 +4,16 @@
[float]
== General
* {lsissue}2376[Issue 2376]: Added ability to install and upgrade Logstash plugins without requiring internet
connectivity.
* {lsissue}2376[Issue 2376]: Added ability to install and upgrade Logstash plugins without requiring internet
connectivity.
* {lsissue}3576[Issue 3576]: Support alternate or private Ruby gems server to install and update plugins.
* {lsissue}3451[Issue 3451]: Added ability to reliably shutdown Logstash when there is a stall in event processing. This
* {lsissue}3451[Issue 3451]: Added ability to reliably shutdown Logstash when there is a stall in event processing. This
option can be enabled by passing `--allow-unsafe-shutdown` flag while starting Logstash. Please be aware that any in-
flight events will be lost when shutdown happens.
* {lsissue}4222[Issue 4222]: Fixed a memory leak which could be triggered when events having a date were serialized to
* {lsissue}4222[Issue 4222]: Fixed a memory leak which could be triggered when events having a date were serialized to
string.
* Added JDBC input to default package.
* {lsissue}3243[Issue 3243]: Adding `--debug` to `--configtest` now shows the configuration in blocks annotated by source
* {lsissue}3243[Issue 3243]: Adding `--debug` to `--configtest` now shows the configuration in blocks annotated by source
config file. Very useful when using multiple config files in a directory.
* {lsissue}4130[Issue 4130]: Reset default worker threads to 1 when using non thread-safe filters like multiline.
* Fixed file permissions for the `logrotate` configuration file.
@ -25,29 +25,29 @@ config file. Very useful when using multiple config files in a directory.
[float]
=== Twitter
* https://github.com/logstash-plugins/logstash-input-twitter/issues/21[Issue 21]: Added an option to fetch data from the
* https://github.com/logstash-plugins/logstash-input-twitter/issues/21[Issue 21]: Added an option to fetch data from the
sample Twitter streaming endpoint.
* https://github.com/logstash-plugins/logstash-input-twitter/issues/22[Issue 22]: Added hashtags, symbols and
* https://github.com/logstash-plugins/logstash-input-twitter/issues/22[Issue 22]: Added hashtags, symbols and
user_mentions as data for the non extended tweet event.
* https://github.com/logstash-plugins/logstash-input-twitter/issues/20[Issue 20]: Added an option to filter per location
* https://github.com/logstash-plugins/logstash-input-twitter/issues/20[Issue 20]: Added an option to filter per location
and language.
* https://github.com/logstash-plugins/logstash-input-twitter/issues/11[Issue 11]: Added an option to stream data from a
* https://github.com/logstash-plugins/logstash-input-twitter/issues/11[Issue 11]: Added an option to stream data from a
list of users.
[float]
=== Beats
* https://github.com/logstash-plugins/logstash-input-beats/issues/10[Issue 10]: Properly handle multiline events from
* https://github.com/logstash-plugins/logstash-input-beats/issues/10[Issue 10]: Properly handle multiline events from
multiple sources, originating from Filebeat.
[float]
=== File
* https://github.com/logstash-plugins/logstash-input-file/issues/44[Issue 44]: Properly handle multiline events from
* https://github.com/logstash-plugins/logstash-input-file/issues/44[Issue 44]: Properly handle multiline events from
multiple sources.
[float]
=== Eventlog
* https://github.com/logstash-plugins/logstash-input-eventlog/issues/11[Issue 11]: Change the underlying library to
capture Event Logs from Windows more reliably.
* https://github.com/logstash-plugins/logstash-input-eventlog/issues/11[Issue 11]: Change the underlying library to
capture Event Logs from Windows more reliably.
[float]
== Output
@ -56,6 +56,6 @@ capture Event Logs from Windows more reliably.
=== Elasticsearch
* Improved the default template to use doc_values wherever possible.
* Improved the default template to disable fielddata on analyzed string fields.
* https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/260[Issue 260]: Added New setting: timeout.
This lets you control the behavior of a slow/stuck request to Elasticsearch that could be, for example, caused by network,
* https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/260[Issue 260]: Added New setting: timeout.
This lets you control the behavior of a slow/stuck request to Elasticsearch that could be, for example, caused by network,
firewall, or load balancer issues.

View file

@ -5,8 +5,8 @@ We also have repositories available for APT and YUM based distributions. Note
that we only provide binary packages, but no source packages, as the packages
are created as part of the Logstash build.
We have split the Logstash package repositories by version into separate urls
to avoid accidental upgrades across major or minor versions. For all 1.5.x
We have split the Logstash package repositories by version into separate urls
to avoid accidental upgrades across major or minor versions. For all 1.5.x
releases use 1.5 as version number, for 1.4.x use 1.4, etc.
We use the PGP key

View file

@ -77,4 +77,3 @@ of workers by passing a command line flag such as:
[source,shell]
bin/logstash `-w 1`

View file

@ -1,35 +0,0 @@
input {
tcp {
type => "apache"
port => 3333
}
}
filter {
if [type] == "apache" {
grok {
# See the following URL for a complete list of named patterns
# logstash/grok ships with by default:
# https://github.com/logstash/logstash/tree/master/patterns
#
# The grok filter will use the below pattern and on successful match use
# any captured values as new fields in the event.
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
# Try to pull the timestamp from the 'timestamp' field (parsed above with
# grok). The apache time format looks like: "18/Aug/2011:05:44:34 -0700"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output {
elasticsearch {
# Setting 'embedded' will run a real elasticsearch server inside logstash.
# This option below saves you from having to run a separate process just
# for ElasticSearch, so you can get started quicker!
embedded => true
}
}

View file

@ -1,33 +0,0 @@
input {
tcp {
type => "apache"
port => 3333
}
}
filter {
if [type] == "apache" {
grok {
# See the following URL for a complete list of named patterns
# logstash/grok ships with by default:
# https://github.com/logstash/logstash/tree/master/patterns
#
# The grok filter will use the below pattern and on successful match use
# any captured values as new fields in the event.
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
# Try to pull the timestamp from the 'timestamp' field (parsed above with
# grok). The apache time format looks like: "18/Aug/2011:05:44:34 -0700"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output {
# Use stdout in debug mode again to see what logstash makes of the event.
stdout {
codec => rubydebug
}
}

View file

@ -1 +0,0 @@
129.92.249.70 - - [18/Aug/2011:06:00:14 -0700] "GET /style2.css HTTP/1.1" 200 1820 "http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"

View file

@ -1,25 +0,0 @@
input {
stdin {
# A type is a label applied to an event. It is used later with filters
# to restrict what filters are run against each event.
type => "human"
}
}
output {
# Print each event to stdout.
stdout {
# Enabling 'rubydebug' codec on the stdout output will make logstash
# pretty-print the entire event as something similar to a JSON representation.
codec => rubydebug
}
# You can have multiple outputs. All events generally to all outputs.
# Output events to elasticsearch
elasticsearch {
# Setting 'embedded' will run a real elasticsearch server inside logstash.
# This option below saves you from having to run a separate process just
# for ElasticSearch, so you can get started quicker!
embedded => true
}
}

View file

@ -1,16 +0,0 @@
input {
stdin {
# A type is a label applied to an event. It is used later with filters
# to restrict what filters are run against each event.
type => "human"
}
}
output {
# Print each event to stdout.
stdout {
# Enabling 'rubydebug' codec on the stdout output will make logstash
# pretty-print the entire event as something similar to a JSON representation.
codec => rubydebug
}
}

View file

@ -1,101 +0,0 @@
---
title: Logstash 10-Minute Tutorial
layout: content_right
---
# Logstash 10-minute Tutorial
## Step 1 - Download
### Download logstash:
* [logstash-%VERSION%.tar.gz](https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz)
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
### Unpack it
tar -xzf logstash-%VERSION%.tar.gz
cd logstash-%VERSION%
### Requirements:
* Java
### The Secret:
Logstash is written in JRuby, but I release standalone jar files for easy
deployment, so you don't need to download JRuby or most any other dependencies.
I bake as much as possible into the single release file.
## Step 2 - A hello world.
### Download this config file:
* [hello.conf](hello.conf)
### Run it:
bin/logstash agent -f hello.conf
Type stuff on standard input. Press enter. Watch what event Logstash sees.
Press ^C to kill it.
## Step 3 - Add ElasticSearch
### Download this config file:
* [hello-search.conf](hello-search.conf)
### Run it:
bin/logstash agent -f hello-search.conf
Same config as step 2, but now we are also writing events to ElasticSearch. Do
a search for `*` (all):
curl 'http://localhost:9200/_search?pretty=1&q=*'
### Download
* [apache-parse.conf](apache-parse.conf)
* [apache_log.1](apache_log.1) (a single apache log line)
### Run it
bin/logstash agent -f apache-parse.conf
Logstash will now be listening on TCP port 3333. Send an Apache log message at it:
nc localhost 3333 < apache_log.1
The expected output can be viewed here: [step-5-output.txt](step-5-output.txt)
## Step 6 - real world example + search
Same as the previous step, but we'll output to ElasticSearch now.
### Download
* [apache-elasticsearch.conf](apache-elasticsearch.conf)
* [apache_log.2.bz2](apache_log.2.bz2) (2 days of apache logs)
### Run it
bin/logstash agent -f apache-elasticsearch.conf
Logstash should be all set for you now. Start feeding it logs:
bzip2 -d apache_log.2.bz2
nc localhost 3333 < apache_log.2
## Want more?
For further learning, try these:
* [Watch a presentation on logstash](http://www.youtube.com/embed/RuUFnog29M4)
* [Getting started 'standalone' guide](http://logstash.net/docs/%VERSION%/tutorials/getting-started-simple)
* [Getting started 'centralized' guide](http://logstash.net/docs/%VERSION%/tutorials/getting-started-centralized) -
learn how to build out your logstash infrastructure and centralize your logs.
* [Dive into the docs](http://logstash.net/docs/%VERSION%/)

View file

@ -1,17 +0,0 @@
{
"type" => "apache",
"clientip" => "129.92.249.70",
"ident" => "-",
"auth" => "-",
"timestamp" => "18/Aug/2011:06:00:14 -0700",
"verb" => "GET",
"request" => "/style2.css",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "1820",
"referrer" => "http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html",
"agent" => "\"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\"",
"@timestamp" => "2011-08-18T13:00:14.000Z",
"host" => "127.0.0.1",
"message" => "129.92.249.70 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 1820 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\"\n"
}

View file

@ -1,436 +0,0 @@
= Getting Started with Logstash
== Introduction
Logstash is a tool for receiving, processing and outputting logs. All kinds of logs. System logs, webserver logs, error logs, application logs and just about anything you can throw at it. Sounds great, eh?
Using Elasticsearch as a backend datastore, and kibana as a frontend reporting tool, Logstash acts as the workhorse, creating a powerful pipeline for storing, querying and analyzing your logs. With an arsenal of built-in inputs, filters, codecs and outputs, you can harness some powerful functionality with a small amount of effort. So, let's get started!
=== Prerequisite: Java
The only prerequisite required by Logstash is a Java runtime. You can check that you have it installed by running the command `java -version` in your shell. Here's something similar to what you might see:
----
> java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
----
It is recommended to run a recent version of Java in order to ensure the greatest success in running Logstash.
It's fine to run an open-source version such as OpenJDK: +
http://openjdk.java.net/
Or you can use the official Oracle version: +
http://www.oracle.com/technetwork/java/index.html
Once you have verified the existence of Java on your system, we can move on!
== Up and Running!
=== Logstash in two commands
First, we're going to download the 'logstash' binary and run it with a very simple configuration.
----
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
----
Now you should have the file named 'logstash-%VERSION%.tar.gz' on your local filesystem. Let's unpack it:
----
tar zxvf logstash-%VERSION%.tar.gz
cd logstash-%VERSION%
----
Here, we are telling the *tar* command that we are sending it a gzipped file (*z* flag), that we would like to extract the file (*x* flag), that we would like to do so verbosely (*v* flag), and that we will provide a filename for *tar* (*f* flag).
Now let's run it:
----
bin/logstash -e 'input { stdin { } } output { stdout {} }'
----
Now type something into your command prompt, and you will see it output by Logstash:
----
hello world
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
----
OK, that's interesting... We ran Logstash with an input called "stdin", and an output named "stdout", and Logstash basically echoed back whatever we typed in some sort of structured format. Note that specifying the *-e* command line flag allows Logstash to accept a configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file between iterations.
Let's try a slightly fancier example. First, you should exit Logstash by issuing a 'CTRL-D' command (or 'CTRL-C Enter') in the shell in which it is running. Now run Logstash again with the following command:
----
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
----
And then try another test input, typing the text "goodnight moon":
----
goodnight moon
{
"message" => "goodnight moon",
"@timestamp" => "2013-11-20T23:48:05.335Z",
"@version" => "1",
"host" => "my-laptop"
}
----
So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of Logstash. By adding inputs, outputs and filters to your configuration, it's possible to massage the log data in many ways, in order to maximize flexibility of the stored data when you are querying it.
== Storing logs with Elasticsearch
Now, you're probably saying, "that's all fine and dandy, but typing all my logs into Logstash isn't really an option, and merely seeing them spit to STDOUT isn't very useful." Good point. First, let's set up Elasticsearch to store the messages we send into Logstash. If you don't have Elasticsearch already installed, you can http://www.elasticsearch.org/download/[download the RPM or DEB package], or install manually by downloading the current release tarball, by issuing the following four commands:
----
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
tar zxvf elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
cd elasticsearch-%ELASTICSEARCH_VERSION%/
./bin/elasticsearch
----
NOTE: This tutorial specifies running Logstash %VERSION% with Elasticsearch %ELASTICSEARCH_VERSION%. Each release of Logstash has a *recommended* version of Elasticsearch to pair with. Make sure the versions match based on the http://www.elasticsearch.org/overview/logstash[Logstash version] you're running!
More detailed information on installing and configuring Elasticsearch can be found on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html[The Elasticsearch reference pages]. However, for the purposes of Getting Started with Logstash, the default installation and configuration of Elasticsearch should be sufficient.
Now that we have Elasticsearch running on port 9200 (we do, right?), Logstash can be simply configured to use Elasticsearch as its backend. The defaults for both Logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional configurations within the elasticsearch output:
----
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
----
Type something, and Logstash will process it as before (this time you won't see any output, since we don't have the stdout output configured)
----
you know, for logs
----
You can confirm that ES actually received the data by making a curl request and inspecting the return:
----
curl 'http://localhost:9200/_search?pretty'
----
which should return something like this:
----
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.11.21",
"_type" : "logs",
"_id" : "2ijaoKqARqGvbMgP3BspJA",
"_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"}
} ]
}
}
----
Congratulations! You've successfully stashed logs in Elasticsearch via Logstash.
=== Elasticsearch Plugins (an aside)
Another very useful tool for querying your Logstash data (and Elasticsearch in general) is the Elasticsearch-kopf plugin. Here is more information on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html[Elasticsearch plugins]. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch directory (the same one in which you ran Elasticsearch earlier):
----
bin/plugin -install lmenezes/elasticsearch-kopf
----
Now you can browse to http://localhost:9200/_plugin/kopf[http://localhost:9200/_plugin/kopf] to browse your Elasticsearch data, settings and mappings!
=== Multiple Outputs
As a quick exercise in configuring multiple Logstash outputs, let's invoke Logstash again, using both the 'stdout' as well as the 'elasticsearch' output:
----
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
----
Typing a phrase will now echo back to your terminal, as well as save in Elasticsearch! (Feel free to verify this using curl, kibana or elasticsearch-kopf).
=== Default - Daily Indices
You might notice that Logstash was smart enough to create a new index in Elasticsearch... The default index name is in the form of 'logstash-YYYY.MM.DD', which essentially creates one index per day. At midnight (GMT?), Logstash will automagically rotate the index to a fresh new one, with the new current day's timestamp. This allows you to keep windows of data, based on how far retroactively you'd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able to query further into the past. If you'd like to simply delete old indices after a certain time period, you can use the https://github.com/elasticsearch/curator[Elasticsearch Curator tool].
== Moving On
Now you're ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features of Logstash, and how they interact with the Logstash engine.
=== The Life of an Event
Inputs, Outputs, Codecs and Filters are at the heart of the Logstash configuration. By creating a pipeline of event processing, Logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your data. To get you thinking about the various options available in Logstash, let's discuss some of the more common configurations currently in use. For more details, read about http://logstash.net/docs/latest/life-of-an-event[the Logstash event pipeline].
==== Inputs
Inputs are the mechanism for passing log data to Logstash. Some of the more useful, commonly-used ones are:
* *file*: reads from a file on the filesystem, much like the UNIX command "tail -0F"
* *syslog*: listens on the well-known port 514 for syslog messages and parses according to RFC3164 format
* *redis*: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized Logstash installation, which queues Logstash events from remote Logstash "shippers".
* *lumberjack*: processes events sent in the lumberjack protocol. Now called https://github.com/elasticsearch/logstash-forwarder[logstash-forwarder].
==== Filters
Filters are used as intermediary processing devices in the Logstash chain. They are often combined with conditionals in order to perform a certain action on an event, if it matches particular criteria. Some useful filters:
* *grok*: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to Logstash, it's more than likely you'll find one that meets your needs!
* *mutate*: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your events.
* *drop*: drop an event completely, for example, 'debug' events.
* *clone*: make a copy of an event, possibly adding or removing fields.
* *geoip*: adds information about geographical location of IP addresses (and displays amazing charts in kibana)
==== Outputs
Outputs are the final phase of the Logstash pipeline. An event may pass through multiple outputs during processing, but once all outputs are complete, the event has finished its execution. Some commonly used outputs include:
* *elasticsearch*: If you're planning to save your data in an efficient, convenient and easily queryable format... Elasticsearch is the way to go. Period. Yes, we're biased :)
* *file*: writes event data to a file on disk.
* *graphite*: sends event data to graphite, a popular open source tool for storing and graphing metrics. http://graphite.wikidot.com/
* *statsd*: a service which "listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services". If you're already using statsd, this could be useful for you!
==== Codecs
Codecs are basically stream filters which can operate as part of an input, or an output. Codecs allow you to easily separate the transport of your messages from the serialization process. Popular codecs include 'json', 'msgpack' and 'plain' (text).
* *json*: encode / decode data in JSON format
* *multiline*: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages
For the complete list of (current) configurations, visit the Logstash "plugin configuration" section of the http://www.elasticsearch.org/overview/logstash[Logstash documentation page].
== More fun with Logstash
=== Persistent Configuration files
Specifying configurations on the command line using '-e' is only so helpful, and more advanced setups will require more lengthy, long-lived configurations. First, let's create a simple configuration file, and invoke Logstash using it. Create a file named "logstash-simple.conf" and save it in the same directory as Logstash.
----
input { stdin { } }
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Then, run this command:
----
bin/logstash -f logstash-simple.conf
----
Et voilà! Logstash will read in the configuration file you just created and run as in the example we saw earlier. Note that we used the '-f' to read in the file, rather than the '-e' to read the configuration from the command line. This is a very simple case, of course, so let's move on to some more complex examples.
=== Testing Your Configuration Files
After creating a new or complex configuration file, it can be helpful to quickly test that the file is formatted correctly. We can verify our configuration file is formatted correctly by using the *--configtest* flag.
----
bin/logstash -f logstash-simple.conf --configtest
----
=== Filters
Filters are an in-line processing mechanism which provide the flexibility to slice and dice your data to fit your needs. Let's see one in action, namely the *grok filter*.
----
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Run Logstash with this configuration:
----
bin/logstash -f logstash-filter.conf
----
Now paste this line into the terminal (so it will be processed by the stdin input):
----
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
----
You should see something returned to STDOUT which looks like this:
----
{
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
"@timestamp" => "2013-12-11T08:01:45.000Z",
"@version" => "1",
"host" => "cadenza",
"clientip" => "127.0.0.1",
"ident" => "-",
"auth" => "-",
"timestamp" => "11/Dec/2013:00:01:45 -0800",
"verb" => "GET",
"request" => "/xampp/status.php",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "3891",
"referrer" => "\"http://cadenza/xampp/navi.php\"",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
}
----
As you can see, Logstash (with help from the *grok* filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying and analyzing our log data... for example, we'll be able to run reports on HTTP response codes, IP addresses, referrers, etc. very easily. There are quite a few grok patterns included with Logstash out-of-the-box, so it's quite likely if you're attempting to parse a fairly common log format, someone has already done the work for you. For more details, see the list of https://github.com/logstash/logstash/blob/master/patterns/grok-patterns[logstash grok patterns] on github.
The other filter used in this example is the *date* filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the @timestamp field in this example is set to December 11, 2013, even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example... the ability to tell Logstash "use this value as the timestamp for this event". For non-english installation you may have to precise the locale in date filter (locale => en).
== Useful Examples
=== Apache logs (from files)
Now, let's configure something actually *useful*... apache2 access log files! We are going to read the input from a file on the localhost, and use a *conditional* to process the event according to our needs. First, create a file called something like 'logstash-apache.conf' with the following contents (you'll need to change the log's file path to suit your needs):
----
input {
file {
path => "/tmp/access_log"
start_position => "beginning"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
}
stdout { codec => rubydebug }
}
----
Then, create the file you configured above (in this example, "/tmp/access_log") with the following log lines as contents (or use some from your own webserver):
----
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"
98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
----
Now run it with the -f flag as in the last example:
----
bin/logstash -f logstash-apache.conf
----
You should be able to see your apache log data in Elasticsearch now! You'll notice that Logstash opened the file you configured, and read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by Logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).
In this configuration, Logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching '*log'), by changing one line in the above configuration, like this:
----
input {
file {
path => "/tmp/*_log"
...
----
Now, rerun Logstash, and you will see both the error and access logs processed via Logstash. However, if you inspect your data (using elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. That's because we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can...
Also, you might have noticed that Logstash did not reprocess the events which were already seen in the access_log file. Logstash is able to save its position in files, only processing new lines as they are added to the file. Neat!
=== Conditionals
Now we can build on the previous example, where we introduced the concept of a *conditional*. A conditional should be familiar to most Logstash users, in the general sense. You may use 'if', 'else if' and 'else' statements, as in many other programming languages. Let's label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").
----
input {
file {
path => "/tmp/*_log"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
You'll notice we've labeled all events using the "type" field, but we didn't actually parse the "error" or "random" files... There are so many types of error logs that it's better left as an exercise for you, depending on the logs you're seeing.
=== Syslog
OK, now we can move on to another incredibly useful example: *syslog*. Syslog is one of the most common use cases for Logstash, and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164 :). Syslog is the de facto UNIX networked logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example, you won't need a functioning syslog instance; we'll fake it from the command line, so you can get a feel for what happens.
First, let's make a simple configuration file for Logstash + syslog, called 'logstash-syslog.conf'.
----
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Run it as normal:
----
bin/logstash -f logstash-syslog.conf
----
Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. In this simplified case, we're simply going to telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell window to interact with the Logstash syslog input and type the following command:
----
telnet localhost 5000
----
You can copy and paste the following lines as samples (feel free to try some of your own, but keep in mind they might not parse if the grok filter is not correct for your data):
----
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]
Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' denied
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)
Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
----
Now you should see the output of Logstash in your original shell as it processes and parses messages!
----
{
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"@timestamp" => "2013-12-23T22:30:01.000Z",
"@version" => "1",
"type" => "syslog",
"host" => "0:0:0:0:0:0:0:1:52617",
"syslog_timestamp" => "Dec 23 14:30:01",
"syslog_hostname" => "louis",
"syslog_program" => "CRON",
"syslog_pid" => "619",
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"received_at" => "2013-12-23 22:49:22 UTC",
"received_from" => "0:0:0:0:0:0:0:1:52617",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}
----
Congratulations! You're well on your way to being a real Logstash power user. You should be comfortable configuring, running and sending events to Logstash, but there's much more to explore.

View file

@ -1,201 +0,0 @@
---
title: Just Enough RabbitMQ - logstash
layout: content_right
---
While configuring your RabbitMQ broker is out of scope for logstash, it's important
to understand how logstash uses RabbitMQ. To do that, we need to understand a
little about AMQP.
You should also consider reading
[this](http://www.rabbitmq.com/tutorials/amqp-concepts.html) at the RabbitMQ
website.
# Exchanges, queues and bindings; OH MY!
You can get a long way by understanding a few key terms.
## Exchanges
Exchanges are for message **producers**. In Logstash, we map these to
**outputs**. Logstash puts messages on exchanges. There are many types of
exchanges and they are discussed below.
## Queues
Queues are for message **consumers**. In Logstash, we map these to inputs.
Logstash reads messages from queues. Optionally, queues can consume only a
subset of messages. This is done with "routing keys".
## Bindings
Just having a producer and a consumer is not enough. We must `bind` a queue to
an exchange. When we bind a queue to an exchange, we can optionally provide a
routing key. Routing keys are discussed below.
## Broker
A broker is simply the AMQP server software. There are several brokers, but this
tutorial will cover the most common (and arguably popular), [RabbitMQ](http://www.rabbitmq.com).
# Routing Keys
Simply put, routing keys are somewhat like tags for messages. In practice, they
are hierarchical in nature with the each level separated by a dot:
- `messages.servers.production`
- `sports.atlanta.baseball`
- `company.myorg.mydepartment`
Routing keys are really handy with a tool like logstash where you
can programatically define the routing key for a given event using the metadata that logstash provides:
- `logs.servers.production.host1`
- `logs.servers.development.host1.syslog`
- `logs.servers.application_foo.critical`
From a consumer/queue perspective, routing keys also support two types wildcards - `#` and `*`.
- `*` (asterisk) matches any single word.
- `#` (hash) matches any number of words and behaves like a traditional wildcard.
Using the above examples, if you wanted to bind to an exchange and see messages
for just production, you would use the routing key `logs.servers.production.*`.
If you wanted to see messages for host1, regardless of environment you could
use `logs.servers.%.host1.#`.
Wildcards can be a bit confusing but a good general rule to follow is to use
`*` in places where you need wildcards for a known element. Use `#` when you
need to match any remaining placeholders. Note that wildcards in routing keys
only make sense on the consumer/queue binding, not in the publishing/exchange
side.
We'll get into some of that neat stuff below. For now, it's enough to
understand the general idea behind routing keys.
# Exchange types
There are three primary types of exchanges that you'll see.
## Direct
A direct exchange is one that is probably most familiar to people. Message
comes in and, assuming there is a queue bound, the message is picked up. You
can have multiple queues bound to the same direct exchange. The best way to
understand this pattern is pool of workers (queues) that read from a direct
exchange to get units of work. Only one consumer will see a given message in a
direct exchange.
You can set routing keys on messages published to a direct exchange. This
allows you do have workers that do different tasks read from the same global
pool of messages yet consume only the ones they know how to handle.
The RabbitMQ concepts guide (linked below) does a good job of describing this
visually
[here](http://www.rabbitmq.com/img/tutorials/intro/exchange-direct.png)
## Fanout
Fanouts are another type of exchange. Unlike direct exchanges, every queue
bound to a fanout exchange will see the same messages. This is best described
as a PUB/SUB pattern. This is helpful when you need broadcast messages to
multiple interested parties.
Fanout exchanges do NOT support routing keys. All bound queues see all
messages.
## Topic
Topic exchanges are special type of fanout exchange. Fanout exchanges don't
support routing keys. Topic exchanges do support them. Just like a fanout
exchange, all bound queues see all messages with the additional filter of the
routing key.
# RabbitMQ in logstash
As stated earlier, in Logstash, Outputs publish to Exchanges. Inputs read from
Queues that are bound to Exchanges. Logstash uses the `bunny` RabbitMQ library for
interaction with a broker. Logstash endeavors to expose as much of the
configuration for both exchanges and queues. There are many different tunables
that you might be concerned with setting - including things like message
durability or persistence of declared queues/exchanges. See the relevant input
and output documentation for RabbitMQ for a full list of tunables.
# Sample configurations, tips, tricks and gotchas
There are several examples in the logstash source directory of RabbitMQ usage,
however a few general rules might help eliminate any issues.
## Check your bindings
If logstash is publishing the messages and logstash is consuming the messages,
the `exchange` value for the input should match the `name` in the output.
sender agent
input { stdin { type = "test" } }
output {
rabbitmq {
exchange => "test_exchange"
host => "my_rabbitmq_server"
exchange_type => "fanout"
}
}
receiver agent
input {
rabbitmq {
queue => "test_queue"
host => "my_rabbitmq_server"
exchange => "test_exchange" # This matches the exchange declared above
}
}
output { stdout { debug => true }}
## Message persistence
By default, logstash will attempt to ensure that you don't lose any messages.
This is reflected in the RabbitMQ default settings as well. However there are
cases where you might not want this. A good example is where RabbitMQ is not your
primary method of shipping.
In the following example, we use RabbitMQ as a sniffing interface. Our primary
destination is the embedded ElasticSearch instance. We have a secondary RabbitMQ
output that we use for duplicating messages. However we disable persistence and
durability on this interface so that messages don't pile up waiting for
delivery. We only use RabbitMQ when we want to watch messages in realtime.
Additionally, we're going to leverage routing keys so that we can optionally
filter incoming messages to subsets of hosts. The exercise of getting messages
to this logstash agent are left up to the user.
input {
# some input definition here
}
output {
elasticsearch { embedded => true }
rabbitmq {
exchange => "logtail"
host => "my_rabbitmq_server"
exchange_type => "topic" # We use topic here to enable pub/sub with routing keys
key => "logs.%{host}"
durable => false # If rabbitmq restarts, the exchange disappears.
auto_delete => true # If logstash disconnects, the exchange goes away
persistent => false # Messages are not persisted to disk
}
}
Now if you want to stream logs in realtime, you can use the programming
language of your choice to bind a queue to the `logtail` exchange. If you do
not specify a routing key, you will see every message that comes in to
logstash. However, you can specify a routing key like `logs.apache1` and see
only messages from host `apache1`.
Note that any logstash variable is valid in the key definition. This allows you
to create really complex routing key hierarchies for advanced filtering.
Note that RabbitMQ has specific rules about durability and persistence matching
on both the queue and exchange. You should read the RabbitMQ documentation to
make sure you don't crash your RabbitMQ server with messages awaiting someone
to pick them up.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

View file

@ -1,84 +0,0 @@
---
title: Metrics from Logs - logstash
layout: content_right
---
# Pull metrics from logs
Logs are more than just text. How many customers signed up today? How many HTTP
errors happened this week? When was your last puppet run?
Apache logs give you the http response code and bytes sent - that's useful in a
graph. Metrics occur in logs so frequently there are piles of tools available to
help process them.
Logstash can help (and even replace some tools you might already be using).
## Example: Replacing Etsy's Logster
[Etsy](https://github.com/etsy) has some excellent open source tools. One of
them, [logster](https://github.com/etsy/logster), is meant to help you pull
metrics from logs and ship them to [graphite](http://graphite.wikidot.com/) so
you can make pretty graphs of those metrics.
One sample logster parser is one that pulls http response codes out of your
apache logs: [SampleLogster.py](https://github.com/etsy/logster/blob/master/logster/parsers/SampleLogster.py)
The above code is roughly 50 lines of python and only solves one specific
problem in only apache logs: count http response codes by major number (1xx,
2xx, 3xx, etc). To be completely fair, you could shrink the code required for
a Logster parser, but size is not strictly the point, here.
## Keep it simple
Logstash can do more than the above, simpler, and without much coding skill:
input {
file {
path => "/var/log/apache/access.log"
type => "apache-access"
}
}
filter {
grok {
type => "apache-access"
pattern => "%{COMBINEDAPACHELOG}"
}
}
output {
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}"
}
}
The above uses grok to parse fields out of apache logs and using the statsd
output to increment counters based on the response code. Of course, now that we
are parsing apache logs fully, we can trivially add additional metrics:
output {
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}"
# Use the 'bytes' field from the apache log as the count value.
count => [ "apache.bytes", "%{bytes}" ]
}
}
Now adding additional metrics is just one more line in your logstash config
file. BTW, the 'statsd' output writes to another Etsy tool,
[statsd](https://github.com/etsy/statsd), which helps build counters/latency
data and ship it to graphite for graphing.
Using the logstash config above and a bunch of apache access requests, you might end up
with a graph that looks like this:
![apache response codes graphed with graphite, fed data with logstash](media/frontend-response-codes.png)
The point made above is not "logstash is better than Logster" - the point is
that logstash is a general-purpose log management and pipelining tool and that
while you can centralize logs with logstash, you can read, modify, and write
them to and from just about anywhere.

View file

@ -1,118 +0,0 @@
---
title: ZeroMQ - logstash
layout: content_right
---
*ZeroMQ support in Logstash is currently in an experimental phase. As such, parts of this document are subject to change.*
# ZeroMQ
Simply put ZeroMQ (0mq) is a socket on steroids. This makes it a perfect compliment to Logstash - a pipe on steroids.
ZeroMQ allows you to easily create sockets of various types for moving data around. These sockets are refered to in ZeroMQ by the behavior of each side of the socket pair:
* PUSH/PULL
* REQ/REP
* PUB/SUB
* ROUTER/DEALER
There is also a `PAIR` socket type as well.
Additionally, the socket type is independent of the connection method. A PUB/SUB socket pair could have the SUB side of the socket be a listener and the PUB side a connecting client. This makes it very easy to fit ZeroMQ into various firewalled architectures.
Note that this is not a full-fledged tutorial on ZeroMQ. It is a tutorial on how Logstash uses ZeroMQ.
# ZeroMQ and logstash
In the spirit of ZeroMQ, Logstash takes these socket type pairs and uses them to create topologies with some very simply rules that make usage very easy to understand:
* The receiving end of a socket pair is always a logstash input
* The sending end of a socket pair is always a logstash output
* By default, inputs `bind`/listen and outputs `connect`
* Logstash refers to the socket pairs as topologies and mirrors the naming scheme from ZeroMQ
* By default, ZeroMQ inputs listen on all interfaces on port 2120, ZeroMQ outputs connect to `localhost` on port 2120
The currently understood Logstash topologies for ZeroMQ inputs and outputs are:
* `pushpull`
* `pubsub`
* `pair`
We have found from various discussions that these three topologies will cover most of user's needs. We hope to expose the full span of ZeroMQ socket types as time goes on.
By keeping the options simple, this allows you to get started VERY easily with what are normally complex message flows. No more confusion over `exchanges` and `queues` and `brokers`. If you need to add fanout capability to your flow, you can simply use the following configs:
* _node agent lives at 192.168.1.2_
* _indexer agent lives at 192.168.1.1_
# Node agent config
input { stdin { type => "test-stdin-input" } }
output { zeromq { topology => "pubsub" address => "tcp://192.168.1.1.:2120" } }
# Indexer agent config
input { zeromq { topology => "pubsub" } }
output { stdout { debug => true }}
If for some reason you need connections to initiate from the indexer because of firewall rules:
# Node agent config - now listening on all interfaces port 2120
input { stdin { type => "test-stdin-input" } }
output { zeromq { topology => "pubsub" address => "tcp://*.:2120" mode => "server" } }
# Indexer agent config
input { zeromq { topology => "pubsub" address => "tcp://192.168.1.2" mode => "client" } }
output { stdout { debug => true }}
As stated above, by default `inputs` always start as listeners and `outputs` always start as initiators. Please don't confuse what happens once the socket is connect with the direction of the connection. ZeroMQ separates connection from topology. In the second case of the above configs, once the two sockets are connected, regardless of who initiated the connection, the message flow itself is absolute. The indexer is reading events from the node.
# Which topology to use
The choice of topology can be broken down very easily based on need
## one to one
Use `pair` topology. On the output side, specify the ipaddress and port of the input side.
## broadcast
Use `pubsub`
If you need to broadcast ALL messages to multiple hosts that each need to see all events, use `pubsub`. Note that all events are broadcast to all subscribers. When using `pubsub` you might also want to investigate the `topic` configuration option which allows subscribers to see only a subset of messages.
## Filter workers
Use `pushpull`
In `pushpull`, ZeroMQ automatically load balances to all connected peers. This means that no peer sees the same message as any other peer.
# What's with the address format?
ZeroMQ supports multiple types of transports:
* inproc:// (unsupported by logstash due to threading)
* tcp:// (exactly what it sounds like)
* ipc:// (probably useless in logstash)
* pgm:// and epgm:// (a multicast format - only usable with PUB and SUB socket types)
For pretty much all cases, you'll be using `tcp://` transports with Logstash.
## Topic - applies to `pubsub`
This opt mimics the routing keys functionality in AMQP. Imagine you have a network of receivers but only a subset of the messages need to be seen by a subset of the hosts. You can use this option as a routing key to facilite that:
# This output is a PUB
output {
zeromq { topology => "pubsub" topic => "logs.production.%{host}" }
}
# This input is a SUB
# I only care about db1 logs
input { zeromq { type => "db1logs" address => "tcp://<ipaddress>:2120" topic => "logs.production.db1"}}
One thing important to note about 0mq PUBSUB and topics is that all filtering is done on the subscriber side. The subscriber will get ALL messages but discard any that don't match the topic.
Also important to note is that 0mq doesn't do topic in the same sense as an AMQP broker might. When a SUB socket gets a message, it compares the first bytes of the message against the topic. However, this isn't always flexible depending on the format of your message. The common practice then, is to send a 0mq multipart message and make the first part the topic. The next parts become the actual message body.
This is approach is how logstash handles this. When using PUBSUB, Logstash will send a multipart message where the first part is the name of the topic and the second part is the event. This is important to know if you are sending to a SUB input from sources other than Logstash.
# sockopts
Sockopts is not you choosing between blue or black socks. ZeroMQ supports setting various flags or options on sockets. In the interest of minimizing configuration syntax, these are _hidden_ behind a logstash configuration element called `sockopts`. You probably won't need to tune these for most cases. If you do need to tune them, you'll probably set the following:
## ZMQ::HWM - sets the high water mark
The high water mark is the maximum number of messages a given socket pair can have in its internal queue. Use this to throttle essentially.
## ZMQ::SWAP_SIZE
TODO
## ZMQ::IDENTITY
TODO