Cleanup docs directory

Remove old, unused markdown docs
Bring dir structure to mirror logstash-docs repo
This commit is contained in:
Suyog Rao 2015-12-16 14:54:34 -08:00
parent 0084c00b38
commit d2d0bd765c
61 changed files with 347 additions and 2578 deletions

3
.gitignore vendored
View file

@ -26,3 +26,6 @@ spec/reports
rspec.xml rspec.xml
.install-done .install-done
.vendor .vendor
integration_run
.mvn/

View file

@ -1,322 +0,0 @@
---
title: Configuration Language - Logstash
layout: content_right
---
# Logstash Config Language
The Logstash config language aims to be simple.
There are 3 main sections: inputs, filters, outputs. Each section has
configurations for each plugin available in that section.
Example:
# This is a comment. You should use comments to describe
# parts of your configuration.
input {
...
}
filter {
...
}
output {
...
}
## Filters and Ordering
For a given event, are applied in the order of appearance in the
configuration file.
## Comments
Comments are the same as in ruby, perl, and python. Starts with a '#' character.
Example:
# this is a comment
input { # comments can appear at the end of a line, too
# ...
}
## Plugins
The input, filter and output sections all let you configure plugins. Plugin
configuration consists of the plugin name followed by a block of settings for
that plugin. For example, how about two file inputs:
input {
file {
path => "/var/log/messages"
type => "syslog"
}
file {
path => "/var/log/apache/access.log"
type => "apache"
}
}
The above configures two file separate inputs. Both set two
configuration settings each: 'path' and 'type'. Each plugin has different
settings for configuring it; seek the documentation for your plugin to
learn what settings are available and what they mean. For example, the
[file input][fileinput] documentation will explain the meanings of the
path and type settings.
[fileinput]: inputs/file
## Value Types
The documentation for a plugin may enforce a configuration field having a
certain type. Examples include boolean, string, array, number, hash,
etc.
### <a name="boolean"></a>Boolean
A boolean must be either `true` or `false`. Note the lack of quotes around
`true` and `false`.
Examples:
debug => true
### <a name="string"></a>String
A string must be a single value.
Example:
name => "Hello world"
Single, unquoted words are valid as strings, too, but you should use quotes.
### <a name="number"></a>Number
Numbers must be valid numerics (floating point or integer are OK).
Example:
port => 33
### <a name="array"></a>Array
An array can be a single string value or multiple. If you specify the same
field multiple times, it appends to the array.
Examples:
path => [ "/var/log/messages", "/var/log/*.log" ]
path => "/data/mysql/mysql.log"
The above makes 'path' a 3-element array including all 3 strings.
### <a name="hash"></a>Hash
A hash is basically the same syntax as Ruby hashes.
The key and value are simply pairs, such as:
match => {
"field1" => "value1"
"field2" => "value2"
...
}
## <a name="eventdependent"></a>Event Dependent Configuration
The logstash agent is a processing pipeline with 3 stages: inputs -> filters ->
outputs. Inputs generate events, filters modify them, outputs ship them
elsewhere.
All events have properties. For example, an apache access log would have things
like status code (200, 404), request path ("/", "index.html"), HTTP verb
(GET, POST), client IP address, etc. Logstash calls these properties "fields."
Some of the configuration options in Logstash require the existence of fields in
order to function. Because inputs generate events, there are no fields to
evaluate within the input block--they do not exist yet!
Because of their dependency on events and fields, the following configuration
options will only work within filter and output blocks.
**IMPORTANT: Field references, sprintf format and conditionals, described below,
will not work in an input block.
### <a name="fieldreferences"></a>Field References
In many cases, it is useful to be able to refer to a field by name. To do this,
you can use the Logstash field reference syntax.
By way of example, let us suppose we have this event:
{
"agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
"ip": "192.168.24.44",
"request": "/index.html"
"response": {
"status": 200,
"bytes": 52353
},
"ua": {
"os": "Windows 7"
}
}
- the syntax to access fields is `[fieldname]`.
- if you are only referring to a **top-level field**, you can omit the `[]` and
simply say `fieldname`.
- in the case of **nested fields**, like the "os" field above, you need
the full path to that field: `[ua][os]`.
### <a name="sprintf"></a>sprintf format
This syntax is also used in what Logstash calls 'sprintf format'. This format
allows you to refer to field values from within other strings. For example, the
statsd output has an 'increment' setting, to allow you to keep a count of
apache logs by status code:
output {
statsd {
increment => "apache.%{[response][status]}"
}
}
You can also do time formatting in this sprintf format. Instead of specifying a
field name, use the `+FORMAT` syntax where `FORMAT` is a
[time format](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html).
For example, if you want to use the file output to write to logs based on the
hour and the 'type' field:
output {
file {
path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
}
}
### <a name="conditionals"></a>Conditionals
Sometimes you only want a filter or output to process an event under
certain conditions. For that, you'll want to use a conditional!
Conditionals in Logstash look and act the same way they do in programming
languages. You have `if`, `else if` and `else` statements. Conditionals may be
nested if you need that.
The syntax is follows:
if EXPRESSION {
...
} else if EXPRESSION {
...
} else {
...
}
What's an expression? Comparison tests, boolean logic, etc!
The following comparison operators are supported:
* equality, etc: ==, !=, <, >, <=, >=
* regexp: =~, !~
* inclusion: in, not in
The following boolean operators are supported:
* and, or, nand, xor
The following unary operators are supported:
* !
Expressions may contain expressions. Expressions may be negated with `!`.
Expressions may be grouped with parentheses `(...)`. Expressions can be long
and complex.
For example, if we want to remove the field `secret` if the field
`action` has a value of `login`:
filter {
if [action] == "login" {
mutate { remove => "secret" }
}
}
The above uses the field reference syntax to get the value of the
`action` field. It is compared against the text `login` and, if equal,
allows the mutate filter to delete the field named `secret`.
How about a more complex example?
* alert nagios of any apache events with status 5xx
* record any 4xx status to elasticsearch
* record all status code hits via statsd
How about telling nagios of any http event that has a status code of 5xx?
output {
if [type] == "apache" {
if [status] =~ /^5\d\d/ {
nagios { ... }
} else if [status] =~ /^4\d\d/ {
elasticsearch { ... }
}
statsd { increment => "apache.%{status}" }
}
}
You can also do multiple expressions in a single condition:
output {
# Send production errors to pagerduty
if [loglevel] == "ERROR" and [deployment] == "production" {
pagerduty {
...
}
}
}
You can test whether a field was present, regardless of its value:
if [exception_message] {
# If the event has an exception_message field, set the level
mutate { add_field => { "level" => "ERROR" } }
}
Here are some examples for testing with the in conditional:
filter {
if [foo] in [foobar] {
mutate { add_tag => "field in field" }
}
if [foo] in "foo" {
mutate { add_tag => "field in string" }
}
if "hello" in [greeting] {
mutate { add_tag => "string in field" }
}
if [foo] in ["hello", "world", "foo"] {
mutate { add_tag => "field in list" }
}
if [missing] in [alsomissing] {
mutate { add_tag => "shouldnotexist" }
}
if !("foo" in ["hello", "world"]) {
mutate { add_tag => "shouldexist" }
}
}
Or, to test if grok was successful:
output {
if "_grokparsefailure" not in [tags] {
elasticsearch { ... }
}
}
## Further Reading
For more information, see [the plugin docs index](index)

View file

@ -1,59 +0,0 @@
---
title: Logstash Contrib plugins
layout: content_right
---
# contrib plugins
As logstash has grown, we've accumulated a massive repository of plugins. Well
over 100 plugins, it became difficult for the project maintainers to adequately
support everything effectively.
In order to improve the quality of popular plugins, we've moved the
less-commonly-used plugins to a separate repository we're calling "contrib".
Concentrating common plugin usage into core solves a few problems, most notably
user complaints about the size of logstash releases, support/maintenance costs,
etc.
It is our intent that this separation will improve life for users. If it
doesn't, please file a bug so we can work to address it!
If a plugin is available in the 'contrib' package, the documentation for that
plugin will note this boldly at the top of that plugin's documentation.
Contrib plugins reside in a [separate github project](https://github.com/elasticsearch/logstash-contrib).
# Packaging
At present, the contrib modules are available as a tarball.
# Automated Installation
The `bin/plugin` script will handle the installation for you:
cd /path/to/logstash
bin/plugin install contrib
# Manual Installation
The contrib plugins can be extracted on top of an existing Logstash installation.
For example, if I've extracted `logstash-%VERSION%.tar.gz` into `/path`, e.g.
cd /path
tar zxf ~/logstash-%VERSION%.tar.gz
It will have a `/path/logstash-%VERSION%` directory, e.g.
$ ls
logstash-%VERSION%
The method to install the contrib tarball is identical.
cd /path
wget http://download.elasticsearch.org/logstash/logstash/logstash-contrib-%VERSION%.tar.gz
tar zxf ~/logstash-contrib-%VERSION%.tar.gz
This will install the contrib plugins in the same directory as the core
install. These plugins will be available to logstash the next time it starts.

View file

@ -1,250 +0,0 @@
require "rubygems"
require "erb"
require "optparse"
require "kramdown" # markdown parser
$: << Dir.pwd
$: << File.join(File.dirname(__FILE__), "..", "lib")
require "logstash/config/mixin"
require "logstash/inputs/base"
require "logstash/codecs/base"
require "logstash/filters/base"
require "logstash/outputs/base"
require "logstash/version"
class LogStashConfigDocGenerator
COMMENT_RE = /^ *#(?: (.*)| *$)/
def initialize
@rules = {
COMMENT_RE => lambda { |m| add_comment(m[1]) },
/^ *class.*< *LogStash::(Outputs|Filters|Inputs|Codecs)::(Base|Threadable)/ => \
lambda { |m| set_class_description },
/^ *config +[^=].*/ => lambda { |m| add_config(m[0]) },
/^ *milestone .*/ => lambda { |m| set_milestone(m[0]) },
/^ *config_name .*/ => lambda { |m| set_config_name(m[0]) },
/^ *flag[( ].*/ => lambda { |m| add_flag(m[0]) },
/^ *(class|def|module) / => lambda { |m| clear_comments },
}
if File.exists?("build/contrib_plugins")
@contrib_list = File.read("build/contrib_plugins").split("\n")
else
@contrib_list = []
end
end
def parse(string)
clear_comments
buffer = ""
string.split(/\r\n|\n/).each do |line|
# Join long lines
if line =~ COMMENT_RE
# nothing
else
# Join extended lines
if line =~ /(, *$)|(\\$)|(\[ *$)/
buffer += line.gsub(/\\$/, "")
next
end
end
line = buffer + line
buffer = ""
@rules.each do |re, action|
m = re.match(line)
if m
action.call(m)
end
end # RULES.each
end # string.split("\n").each
end # def parse
def set_class_description
@class_description = @comments.join("\n")
clear_comments
end # def set_class_description
def add_comment(comment)
return if comment == "encoding: utf-8"
@comments << comment
end # def add_comment
def add_config(code)
# I just care about the 'config :name' part
code = code.sub(/,.*/, "")
# call the code, which calls 'config' in this class.
# This will let us align comments with config options.
name, opts = eval(code)
# TODO(sissel): This hack is only required until regexp configs
# are gone from logstash.
name = name.to_s unless name.is_a?(Regexp)
description = Kramdown::Document.new(@comments.join("\n")).to_html
@attributes[name][:description] = description
clear_comments
end # def add_config
def add_flag(code)
# call the code, which calls 'config' in this class.
# This will let us align comments with config options.
#p :code => code
fixed_code = code.gsub(/ do .*/, "")
#p :fixedcode => fixed_code
name, description = eval(fixed_code)
@flags[name] = description
clear_comments
end # def add_flag
def set_config_name(code)
name = eval(code)
@name = name
end # def set_config_name
def set_milestone(code)
@milestone = eval(code)
end
# pretend to be the config DSL and just get the name
def config(name, opts={})
return name, opts
end # def config
# Pretend to support the flag DSL
def flag(*args, &block)
name = args.first
description = args.last
return name, description
end # def config
# pretend to be the config dsl's 'config_name' method
def config_name(name)
return name
end # def config_name
# pretend to be the config dsl's 'milestone' method
def milestone(m)
return m
end # def milestone
def clear_comments
@comments.clear
end # def clear_comments
def generate(file, settings)
@class_description = ""
@milestone = ""
@comments = []
@attributes = Hash.new { |h,k| h[k] = {} }
@flags = {}
# local scoping for the monkeypatch belowg
attributes = @attributes
# Monkeypatch the 'config' method to capture
# Note, this monkeypatch requires us do the config processing
# one at a time.
#LogStash::Config::Mixin::DSL.instance_eval do
#define_method(:config) do |name, opts={}|
#p name => opts
#attributes[name].merge!(opts)
#end
#end
# Loading the file will trigger the config dsl which should
# collect all the config settings.
load file
# parse base first
parse(File.new(File.join(File.dirname(file), "base.rb"), "r").read)
# Now parse the real library
code = File.new(file).read
# inputs either inherit from Base or Threadable.
if code =~ /\< LogStash::Inputs::Threadable/
parse(File.new(File.join(File.dirname(file), "threadable.rb"), "r").read)
end
if code =~ /include LogStash::PluginMixins/
mixin = code.gsub(/.*include LogStash::PluginMixins::(\w+)\s.*/m, '\1')
mixin.gsub!(/(.)([A-Z])/, '\1_\2')
mixin.downcase!
parse(File.new(File.join(File.dirname(file), "..", "plugin_mixins", "#{mixin}.rb")).read)
end
parse(code)
puts "Generating docs for #{file}"
if @name.nil?
$stderr.puts "Missing 'config_name' setting in #{file}?"
return nil
end
klass = LogStash::Config::Registry.registry[@name]
if klass.ancestors.include?(LogStash::Inputs::Base)
section = "input"
elsif klass.ancestors.include?(LogStash::Filters::Base)
section = "filter"
elsif klass.ancestors.include?(LogStash::Outputs::Base)
section = "output"
elsif klass.ancestors.include?(LogStash::Codecs::Base)
section = "codec"
end
template_file = File.join(File.dirname(__FILE__), "plugin-doc.html.erb")
template = ERB.new(File.new(template_file).read, nil, "-")
is_contrib_plugin = @contrib_list.include?(file)
# descriptions are assumed to be markdown
description = Kramdown::Document.new(@class_description).to_html
klass.get_config.each do |name, settings|
@attributes[name].merge!(settings)
end
sorted_attributes = @attributes.sort { |a,b| a.first.to_s <=> b.first.to_s }
klassname = LogStash::Config::Registry.registry[@name].to_s
name = @name
synopsis_file = File.join(File.dirname(__FILE__), "plugin-synopsis.html.erb")
synopsis = ERB.new(File.new(synopsis_file).read, nil, "-").result(binding)
if settings[:output]
dir = File.join(settings[:output], section + "s")
path = File.join(dir, "#{name}.html")
Dir.mkdir(settings[:output]) if !File.directory?(settings[:output])
Dir.mkdir(dir) if !File.directory?(dir)
File.open(path, "w") do |out|
html = template.result(binding)
html.gsub!("%VERSION%", LOGSTASH_VERSION)
html.gsub!("%PLUGIN%", @name)
out.puts(html)
end
else
puts template.result(binding)
end
end # def generate
end # class LogStashConfigDocGenerator
if __FILE__ == $0
opts = OptionParser.new
settings = {}
opts.on("-o DIR", "--output DIR",
"Directory to output to; optional. If not specified,"\
"we write to stdout.") do |val|
settings[:output] = val
end
args = opts.parse(ARGV)
args.each do |arg|
gen = LogStashConfigDocGenerator.new
gen.generate(arg, settings)
end
end

View file

@ -1,108 +0,0 @@
---
title: How to extend - logstash
layout: content_right
---
# Add a new filter
This document shows you how to add a new filter to logstash.
For a general overview of how to add a new plugin, see [the extending
logstash](.) overview.
## Write code.
Let's write a 'hello world' filter. This filter will replace the 'message' in
the event with "Hello world!"
First, logstash expects plugins in a certain directory structure: `logstash/TYPE/PLUGIN_NAME.rb`
Since we're creating a filter, let's mkdir this:
mkdir -p logstash/filters/
cd logstash/filters
Now add the code:
# Call this file 'foo.rb' (in logstash/filters, as above)
require "logstash/filters/base"
require "logstash/namespace"
class LogStash::Filters::Foo < LogStash::Filters::Base
# Setting the config_name here is required. This is how you
# configure this filter from your logstash config.
#
# filter {
# foo { ... }
# }
config_name "foo"
# New plugins should start life at milestone 1.
milestone 1
# Replace the message with this value.
config :message, :validate => :string
public
def register
# nothing to do
end # def register
public
def filter(event)
# return nothing unless there's an actual filter event
return unless filter?(event)
if @message
# Replace the event message with our message as configured in the
# config file.
event["message"] = @message
end
# filter_matched should go in the last line of our successful code
filter_matched(event)
end # def filter
end # class LogStash::Filters::Foo
## Add it to your configuration
For this simple example, let's just use stdin input and stdout output.
The config file looks like this:
input {
stdin { type => "foo" }
}
filter {
if [type] == "foo" {
foo {
message => "Hello world!"
}
}
}
output {
stdout { }
}
Call this file 'example.conf'
## Tell logstash about it.
Depending on how you installed logstash, you have a few ways of including this
plugin.
You can use the agent flag --pluginpath flag to specify where the root of your
plugin tree is. In our case, it's the current directory.
% bin/logstash --pluginpath your/plugin/root -f example.conf
## Example running
In the example below, I typed in "the quick brown fox" after running the java
command.
% bin/logstash -f example.conf
the quick brown fox
2011-05-12T01:05:09.495000Z stdin://snack.home/: Hello world!
The output is the standard logstash stdout output, but in this case our "the
quick brown fox" message was replaced with "Hello world!"
All done! :)

View file

@ -1,91 +0,0 @@
---
title: How to extend - logstash
layout: content_right
---
# Extending logstash
You can add your own input, output, or filter plugins to logstash.
If you're looking to extend logstash today, please look at the existing plugins.
## Good examples of plugins
* [inputs/tcp](https://github.com/logstash/logstash/blob/master/lib/logstash/inputs/tcp.rb)
* [filters/multiline](https://github.com/logstash/logstash/blob/master/lib/logstash/filters/multiline.rb)
* [outputs/mongodb](https://github.com/logstash/logstash/blob/master/lib/logstash/outputs/mongodb.rb)
## Common concepts
* The `config_name` sets the name used in the config file.
* The `milestone` sets the milestone number of the plugin. See <../plugin-milestones> for more info.
* The `config` lines define config options.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
### Required modules
All plugins should require the Logstash module.
require 'logstash/namespace'
### Plugin name
Every plugin must have a name set with the `config_name` method. If this
is not specified plugins will fail to load with an error.
### Milestones
Every plugin needs a milestone set using `milestone`. See
<../plugin-milestones> for more info.
### Config lines
The `config` lines define configuration options and are constructed like
so:
config :host, :validate => :string, :default => "0.0.0.0"
The name of the option is specified, here `:host` and then the
attributes of the option. They can include `:validate`, `:default`,
`:required` (a Boolean `true` or `false`), `:deprecated` (also a
Boolean), and `:obsolete` (a String value).
## Inputs
All inputs require the LogStash::Inputs::Base class:
require 'logstash/inputs/base'
Inputs have two methods: `register` and `run`.
* Each input runs as its own thread.
* The `run` method is expected to run-forever.
## Filters
All filters require the LogStash::Filters::Base class:
require 'logstash/filters/base'
Filters have two methods: `register` and `filter`.
* The `filter` method gets an event.
* Call `event.cancel` to drop the event.
* To modify an event, simply make changes to the event you are given.
* The return value is ignored.
## Outputs
All outputs require the LogStash::Outputs::Base class:
require 'logstash/outputs/base'
Outputs have two methods: `register` and `receive`.
* The `register` method is called per plugin instantiation. Do any of your initialization here.
* The `receive` method is called when an event gets pushed to your output
## Example: a new filter
Learn by example how to [add a new filter to logstash](example-add-a-new-filter)

View file

@ -1,45 +0,0 @@
---
title: Command-line flags - logstash
layout: content_right
---
# Command-line flags
## Agent
The logstash agent has the following flags (also try using the '--help' flag)
<dl>
<dt> -f, --config CONFIGFILE </dt>
<dd> Load the logstash config from a specific file, directory, or a
wildcard. If given a directory or wildcard, config files will be read
from the directory in alphabetical order. </dd>
<dt> -e CONFIGSTRING </dt>
<dd> Use the given string as the configuration data. Same syntax as the
config file. If not input is specified, 'stdin { type => stdin }' is
default. If no output is specified, 'stdout { debug => true }}' is
default. </dd>
<dt> -w, --filterworkers COUNT </dt>
<dd> Run COUNT filter workers (default: 1) </dd>
<dt> -l, --log FILE </dt>
<dd> Log to a given path. Default is to log to stdout </dd>
<dt> --verbose </dt>
<dd> Increase verbosity to the first level, less verbose.</dd>
<dt> --debug </dt>
<dd> Increase verbosity to the last level, more verbose.</dd>
<dt> -v </dt>
<dd> *DEPRECATED: see --verbose/debug* Increase verbosity. There are multiple levels of verbosity available with
'-vv' currently being the highest </dd>
<dt> --pluginpath PLUGIN_PATH </dt>
<dd> A colon-delimted path to find other logstash plugins in </dd>
</dl>
## Web
<dl>
<dt> -a, --address ADDRESS </dt>
<dd>Address on which to start webserver. Default is 0.0.0.0.</dd>
<dt> -p, --port PORT</dt>
<dd>Port on which to start webserver. Default is 9292.</dd>
</dl>

View file

@ -1,28 +0,0 @@
#!/usr/bin/env ruby
require "erb"
if ARGV.size != 1
$stderr.puts "No path given to search for plugin docs"
$stderr.puts "Usage: #{$0} plugin_doc_dir"
exit 1
end
def plugins(glob)
files = Dir.glob(glob)
names = files.collect { |f| File.basename(f).gsub(".html", "") }
return names.sort
end # def plugins
basedir = ARGV[0]
docs = {
"inputs" => plugins(File.join(basedir, "inputs/*.html")),
"codecs" => plugins(File.join(basedir, "codecs/*.html")),
"filters" => plugins(File.join(basedir, "filters/*.html")),
"outputs" => plugins(File.join(basedir, "outputs/*.html")),
}
template_path = File.join(File.dirname(__FILE__), "index.html.erb")
template = File.new(template_path).read
erb = ERB.new(template, nil, "-")
puts erb.result(binding)

View file

@ -1,46 +0,0 @@
---
title: Learn - logstash
layout: content_right
---
# What is Logstash?
Logstash is a tool for managing your logs.
It helps you take logs and other event data from your systems and move it into
a central place. Logstash is open source and completely free. You can find
support on the discussion forum and on IRC.
For an overview of Logstash and why you would use it, you should watch the
presentation I gave at CarolinaCon 2011:
[video here](http://carolinacon.blip.tv/file/5105901/). This presentation covers
Logstash, how you can use it, some alternatives, logging best practices,
parsing tools, etc. Video also below:
<!--
<embed src="http://blip.tv/play/gvE9grjcdQI" type="application/x-shockwave-flash" width="480" height="296" allowscriptaccess="always" allowfullscreen="true"></embed>
The slides are available online here: [slides](http://goo.gl/68c62). The slides
include speaker notes (click 'actions' then 'speaker notes').
-->
<iframe width="480" height="296" src="http://www.youtube.com/embed/RuUFnog29M4" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
The slides are available online here: [slides](http://semicomplete.com/presentations/logstash-puppetconf-2012/).
## Getting Help
There's [documentation](.) here on this site. If that isn't sufficient, you can
use the discussion [forum](https://discuss.elastic.co/c/logstash). Further, there is also
an IRC channel - #logstash on irc.freenode.org.
If you find a bug or have a feature request, file them
on [github](https://github.com/elasticsearch/logstas/issues). (Honestly though, if you prefer email or irc
for such things, that works for me, too.)
## Download It
[Download logstash-%VERSION%](https://download.elastic.co/logstash/logstash/logstash-%VERSION%.tar.gz)
## What's next?
Try this [guide](tutorials/getting-started-with-logstash) for a simple
real-world example getting started using Logstash.

View file

@ -1,109 +0,0 @@
---
title: the life of an event - logstash
layout: content_right
---
# the life of an event
The logstash agent is an event pipeline.
## The Pipeline
The logstash agent is a processing pipeline with 3 stages: inputs -> filters ->
outputs. Inputs generate events, filters modify them, outputs ship them
elsewhere.
Internal to logstash, events are passed from each phase using internal queues.
It is implemented with a 'SizedQueue' in Ruby. SizedQueue allows a bounded
maximum of items in the queue such that any writes to the queue will block if
the queue is full at maximum capacity.
Logstash sets each queue size to 20. This means only 20 events can be pending
into the next phase - this helps reduce any data loss and in general avoids
logstash trying to act as a data storage system. These internal queues are not
for storing messages long-term.
## Fault Tolerance
Starting at outputs, here's what happens when things break.
An output can fail or have problems because of some downstream cause, such as
full disk, permissions problems, temporary network failures, or service
outages. Most outputs should keep retrying to ship any events that were
involved in the failure.
If an output is failing, the output thread will wait until this output is
healthy again and able to successfully send the message. Therefore, the output
queue will stop being read from by this output and will eventually fill up with
events and block new events from being written to this queue.
A full output queue means filters will block trying to write to the output
queue. Because filters will be stuck, blocked writing to the output queue, they
will stop reading from the filter queue which will eventually cause the filter
queue (input -> filter) to fill up.
A full filter queue will cause inputs to block when writing to the filters.
This will cause each input to block, causing each input to stop processing new
data from wherever that input is getting new events.
In ideal circumstances, this will behave similarly to when the tcp window
closes to 0, no new data is sent because the receiver hasn't finished
processing the current queue of data, but as soon as the downstream (output)
problem is resolved, messages will begin flowing again..
## Thread Model
The thread model in logstash is currently:
input threads | filter worker threads | output worker
Filters are optional, so you will have this model if you have no filters
defined:
input threads | output worker
Each input runs in a thread by itself. This allows busier inputs to not be
blocked by slower ones, etc. It also allows for easier containment of scope
because each input has a thread.
The filter thread model is a 'worker' model where each worker receives an event
and applies all filters, in order, before emitting that to the output queue.
This allows scalability across CPUs because many filters are CPU intensive
(permitting that we have thread safety).
The default number of filter workers is 1, but you can increase this number
with the '-w' flag on the agent.
The output worker model is currently a single thread. Outputs will receive
events in the order they are defined in the config file.
Outputs may decide to buffer events temporarily before publishing them,
possibly in a separate thread. One example of this is the elasticsearch output
which will buffer events and flush them all at once, in a separate thread. This
mechanism (buffering many events + writing in a separate thread) can improve
performance so the logstash pipeline isn't stalled waiting for a response from
elasticsearch.
## Consequences and Expectations
Small queue sizes mean that logstash simply blocks and stalls safely during
times of load or other temporary pipeline problems. There are two alternatives
to this - unlimited queue length and dropping messages. Unlimited queues grow
grow unbounded and eventually exceed memory causing a crash which loses all of
those messages. Dropping messages is also an undesirable behavior in most cases.
At a minimum, logstash will have probably 3 threads (2 if you have no filters).
One input, one filter worker, and one output thread each.
If you see logstash using multiple CPUs, this is likely why. If you want to
know more about what each thread is doing, you should read this:
<http://www.semicomplete.com/blog/geekery/debugging-java-performance.html>.
Threads in java have names, and you can use jstack and top to figure out who is
using what resources. The URL above will help you learn how to do this.
On Linux platforms, logstash will label all the threads it can with something
descriptive. Inputs will show up as "<inputname" and filter workers as
"|worker" and outputs as ">outputworker" (or something similar). Other threads
may be labeled as well, and are intended to help you identify their purpose
should you wonder why they are consuming resources!

View file

@ -1,60 +0,0 @@
---
title: Logging tools comparisons - logstash
layout: content_right
---
# Logging tools comparison
The information below is provided as "best effort" and is not strictly intended
as a complete source of truth. If the information below is unclear or incorrect, please
email the logstash-users list (or send a pull request with the fix) :)
Where feasible, this document will also provide information on how you can use
logstash with these other projects.
# logstash
Primary goal: Make log/event data and analytics accessible.
Overview: Where your logs come from, how you store them, or what you do with
them is up to you. Logstash exists to help make such actions easier and faster.
It provides you a simple event pipeline for taking events and logs from any
input, manipulating them with filters, and sending them to any output. Inputs
can be files, network, message brokers, etc. Filters are date and string
parsers, grep-like, etc. Outputs are data stores (elasticsearch, mongodb, etc),
message systems (rabbitmq, stomp, etc), network (tcp, syslog), etc.
It also provides a web interface for doing search and analytics on your
logs.
# graylog2
[http://graylog2.org/](http://graylog2.org)
_Overview to be written_
You can use graylog2 with logstash by using the 'gelf' output to send logstash
events to a graylog2 server. This gives you logstash's excellent input and
filter features while still being able to use the graylog2 web interface.
# whoops
[whoops site](http://www.whoopsapp.com/)
_Overview to be written_
A logstash output to whoops is coming soon - <https://logstash.jira.com/browse/LOGSTASH-133>
# flume
[flume site](https://github.com/cloudera/flume/wiki)
Flume is primarily a transport system aimed at reliably copying logs from
application servers to HDFS.
You can use it with logstash by having a syslog sink configured to shoot logs
at a logstash syslog input.
# scribe
_Overview to be written_

View file

@ -1,41 +0,0 @@
---
title: Plugin Milestones - logstash
layout: content_right
---
# Plugin Milestones
Plugins (inputs/outputs/filters/codecs) have a milestone label in logstash.
This is to provide an indicator to the end-user as to the kinds of changes
a given plugin could have between logstash releases.
The desire here is to allow plugin developers to quickly iterate on possible
new plugins while conveying to the end-user a set of expectations about that
plugin.
## Milestone 1
Plugins at this milestone need your feedback to improve! Plugins at this
milestone may change between releases as the community figures out the best way
for the plugin to behave and be configured.
## Milestone 2
Plugins at this milestone are more likely to have backwards-compatibility to
previous releases than do Milestone 1 plugins. This milestone also indicates
a greater level of in-the-wild usage by the community than the previous
milestone.
## Milestone 3
Plugins at this milestone have strong promises towards backwards-compatibility.
This is enforced with automated tests to ensure behavior and configuration are
consistent across releases.
## Milestone 0
This milestone appears at the bottom of the page because it is very
infrequently used.
This milestone marker is used to generally indicate that a plugin has no
active code maintainer nor does it have support from the community in terms
of getting help.

View file

@ -1,64 +0,0 @@
---
title: release notes for %VERSION%
layout: content_right
---
# %VERSION% - Release Notes
This document is targeted at existing users of Logstash who are upgrading from
an older version to version %VERSION%. This document is intended to supplement
a the [changelog
file](https://github.com/elasticsearch/logstash/blob/v%VERSION%/CHANGELOG) by
providing more details on certain changes.
### tarball
With Logstash 1.4.0, we stopped shipping the jar file and started shipping a
tarball instead.
Past releases have been a single jar file which included all Ruby and Java
library dependencies to eliminate deployment pains. We still ship all
the dependencies for you! The jar file served us well, but over time we found
Javas default heap size, garbage collector, and other settings werent well
suited to Logstash.
In order to provide better Java defaults, weve changed to releasing a tarball
(.tar.gz) that includes all the same dependencies. What does this mean to you?
Instead of running `java -jar logstash.jar ...` you run `bin/logstash ...` (for
Windows users, `bin/logstash.bat`)
One pleasant side effect of using a tarball is that the Logstash code itself is
much more accessible and able to satisfy any curiosity you may have.
The new way to do things is:
* Download logstash tarball
* Unpack it (`tar -zxf logstash-%VERSION%.tar.gz`)
* `cd logstash-%VERSION%`
% Run it: `bin/logstash ...`
The old way to run logstash of `java -jar logstash.jar` is now replaced with
`bin/logstash`. The command line arguments are exactly the same after that.
For example:
# Old way:
`% java -jar logstash-1.3.3-flatjar.jar agent -f logstash.conf`
# New way:
`% bin/logstash agent -f logstash.conf`
### plugins
Logstash has grown brilliantly over the past few years with great contributions
from the community. Now having 165 plugins, it became hard for us (the Logstash
engineering team) to reliably support all the wonderful technologies in each
contributed plugin. We combed through all the plugins and picked the ones we
felt strongly we could support, and those now ship by default with Logstash.
All the other plugins are now available in a contrib package. All plugins
continue to be open source and free, of course! Installing plugins is very easy:
....
% cd /path/to/logstash-%VERSION%/
% bin/plugin install [PLUGIN_NAME]
....

View file

@ -1,35 +0,0 @@
---
title: repositories - logstash
layout: content_right
---
# Logstash repositories
We also have Logstash available as APT and YUM repositories.
Our public signing key can be found on the [Elasticsearch packages apt GPG signing key page](https://packages.elasticsearch.org/GPG-KEY-elasticsearch)
## Apt based distributions
Add the key:
wget -O - https://packages.elasticsearch.org/GPG-KEY-elasticsearch | apt-key add -
Add the repo to /etc/apt/sources.list
deb http://packages.elasticsearch.org/logstash/1.4/debian stable main
## YUM based distributions
Add the key:
rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch
Add the repo to /etc/yum.repos.d/ directory
[logstash-1.4]
name=logstash repository for 1.4.x packages
baseurl=https://packages.elasticsearch.org/logstash/1.4/centos
gpgcheck=1
gpgkey=https://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1

View file

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 77 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 55 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 105 KiB

After

Width:  |  Height:  |  Size: 105 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 164 KiB

After

Width:  |  Height:  |  Size: 164 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 172 KiB

After

Width:  |  Height:  |  Size: 172 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 68 KiB

Before After
Before After

View file

@ -1,15 +1,27 @@
[[working-with-plugins]] [[working-with-plugins]]
== Working with plugins == Working with plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Logstash has a rich collection of input, filter, codec and output plugins. Plugins are available as self-contained packages called gems and hosted on RubyGems.org. The plugin manager accesed via `bin/plugin` script is used to manage the lifecycle of plugins in your Logstash deployment. You can install, uninstall and upgrade plugins using these Command Line Interface (CLI) described below. Logstash has a rich collection of input, filter, codec and output plugins. Plugins are available as self-contained packages called gems and hosted on RubyGems.org. The plugin manager accesed via `bin/plugin` script is used to manage the lifecycle of plugins in your Logstash deployment. You can install, uninstall and upgrade plugins using these Command Line Interface (CLI) described below.
NOTE: Some sections here are for advanced users NOTE: Some sections here are for advanced users
=======
Logstash has a rich collection of input, filter, codec and output plugins. Plugins are available as self-contained
packages called gems and hosted on RubyGems.org. The plugin manager accesed via `bin/plugin` script is used to manage the
lifecycle of plugins in your Logstash deployment. You can install, uninstall and upgrade plugins using these Command Line
Interface (CLI) described below.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[float] [float]
[[listing-plugins]] [[listing-plugins]]
=== Listing plugins === Listing plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Logstash release packages bundle common plugins so you can use them out of the box. To list the plugins currently available in your deployment: Logstash release packages bundle common plugins so you can use them out of the box. To list the plugins currently available in your deployment:
=======
Logstash release packages bundle common plugins so you can use them out of the box. To list the plugins currently
available in your deployment:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------
@ -30,7 +42,13 @@ bin/plugin list --group output <4>
[[installing-plugins]] [[installing-plugins]]
=== Adding plugins to your deployment === Adding plugins to your deployment
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
The most common situation when dealing with plugin installation is when you have access to internet. Using this method, you will be able to retrieve plugins hosted on the public repository (RubyGems.org) and install on top of your Logstash installation. The most common situation when dealing with plugin installation is when you have access to internet. Using this method, you will be able to retrieve plugins hosted on the public repository (RubyGems.org) and install on top of your Logstash installation.
=======
The most common situation when dealing with plugin installation is when you have access to internet. Using this method,
you will be able to retrieve plugins hosted on the public repository (RubyGems.org) and install on top of your Logstash
installation.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------
@ -43,7 +61,12 @@ Once the plugin is successfully installed, you can start using it in your config
[float] [float]
==== Advanced: Adding a locally built plugin ==== Advanced: Adding a locally built plugin
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
In some cases, you want to install plugins which have not yet been released and not hosted on RubyGems.org. Logstash provides you the option to install a locally built plugin which is packaged as a ruby gem. Using a file location: In some cases, you want to install plugins which have not yet been released and not hosted on RubyGems.org. Logstash provides you the option to install a locally built plugin which is packaged as a ruby gem. Using a file location:
=======
In some cases, you want to install plugins which have not yet been released and not hosted on RubyGems.org. Logstash
provides you the option to install a locally built plugin which is packaged as a ruby gem. Using a file location:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------
@ -54,7 +77,12 @@ bin/plugin install /path/to/logstash-output-kafka-1.0.0.gem
[float] [float]
==== Advanced: Using `--pluginpath` ==== Advanced: Using `--pluginpath`
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Using the `--pluginpath` flag, you can load a plugin source code located on your file system. Typically this is used by developers who are iterating on a custom plugin and want to test it before creating a ruby gem. Using the `--pluginpath` flag, you can load a plugin source code located on your file system. Typically this is used by developers who are iterating on a custom plugin and want to test it before creating a ruby gem.
=======
Using the `--pluginpath` flag, you can load a plugin source code located on your file system. Typically this is used by
developers who are iterating on a custom plugin and want to test it before creating a ruby gem.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------
@ -65,7 +93,12 @@ bin/logstash --pluginpath /opt/shared/lib/logstash/input/my-custom-plugin-code.r
[float] [float]
=== Updating plugins === Updating plugins
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
Plugins have their own release cycle and are often released independent of Logstashs core release cycle. Using the update sub-command you can get the latest or update to a particular version of the plugin. Plugins have their own release cycle and are often released independent of Logstashs core release cycle. Using the update sub-command you can get the latest or update to a particular version of the plugin.
=======
Plugins have their own release cycle and are often released independent of Logstashs core release cycle. Using the update
subcommand you can get the latest or update to a particular version of the plugin.
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------
@ -91,7 +124,13 @@ bin/plugin uninstall logstash-output-kafka
[float] [float]
=== Proxy Support === Proxy Support
<<<<<<< HEAD:docs/asciidoc/static/plugin-manager.asciidoc
The previous sections relied on Logstash being able to communicate with RubyGems.org. In certain environments, Forwarding Proxy is used to handle HTTP requests. Logstash Plugins can be installed and updated through a Proxy by setting the `HTTP_PROXY` environment variable: The previous sections relied on Logstash being able to communicate with RubyGems.org. In certain environments, Forwarding Proxy is used to handle HTTP requests. Logstash Plugins can be installed and updated through a Proxy by setting the `HTTP_PROXY` environment variable:
=======
The previous sections relied on Logstash being able to communicate with RubyGems.org. In certain environments, Forwarding
Proxy is used to handle HTTP requests. Logstash Plugins can be installed and updated through a Proxy by setting the
`HTTP_PROXY` environment variable:
>>>>>>> 9477db2... Cleanup docs directory:docs/static/plugin-manager.asciidoc
[source,shell] [source,shell]
---------------------------------- ----------------------------------

53
docs/static/private-gem-repo.asciidoc vendored Normal file
View file

@ -0,0 +1,53 @@
[[private-rubygem]]
=== Private Gem Repositories
The Logstash plugin manager connects to a Ruby gems repository to install and update Logstash plugins. By default, this
repository is http://rubygems.org.
Some use cases are unable to use the default repository, as in the following examples:
* A firewall blocks access to the default repository.
* You are developing your own plugins locally.
* Airgap requirements on the local system.
When you use a custom gem repository, be sure to make plugin dependencies available.
Several open source projects enable you to run your own plugin server, among them:
* https://github.com/geminabox/geminabox[Geminabox]
* https://github.com/PierreRambaud/gemirro[Gemirro]
* https://gemfury.com/[Gemfury]
* http://www.jfrog.com/open-source/[Artifactory]
==== Editing the Gemfile
The gemfile is a configuration file that specifies information required for plugin management. Each gem file has a
`source` line that specifies a location for plugin content.
By default, the gemfile's `source` line reads:
[source,shell]
----------
# This is a Logstash generated Gemfile.
# If you modify this file manually all comments and formatting will be lost.
source "https://rubygems.org"
----------
To change the source, edit the `source` line to contain your preferred source, as in the following example:
[source,shell]
----------
# This is a Logstash generated Gemfile.
# If you modify this file manually all comments and formatting will be lost.
source "https://my.private.repository"
----------
After saving the new version of the gemfile, use <<working-with-plugins,plugin management commands>> normally.
The following links contain further material on setting up some commonly used repositories:
* https://github.com/geminabox/geminabox/blob/master/README.markdown[Geminabox]
* https://www.jfrog.com/confluence/display/RTF/RubyGems+Repositories[Artifactory]
* Running a http://guides.rubygems.org/run-your-own-gem-server/[rubygems mirror]

View file

@ -77,4 +77,3 @@ of workers by passing a command line flag such as:
[source,shell] [source,shell]
bin/logstash `-w 1` bin/logstash `-w 1`

View file

@ -1,35 +0,0 @@
input {
tcp {
type => "apache"
port => 3333
}
}
filter {
if [type] == "apache" {
grok {
# See the following URL for a complete list of named patterns
# logstash/grok ships with by default:
# https://github.com/logstash/logstash/tree/master/patterns
#
# The grok filter will use the below pattern and on successful match use
# any captured values as new fields in the event.
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
# Try to pull the timestamp from the 'timestamp' field (parsed above with
# grok). The apache time format looks like: "18/Aug/2011:05:44:34 -0700"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output {
elasticsearch {
# Setting 'embedded' will run a real elasticsearch server inside logstash.
# This option below saves you from having to run a separate process just
# for ElasticSearch, so you can get started quicker!
embedded => true
}
}

View file

@ -1,33 +0,0 @@
input {
tcp {
type => "apache"
port => 3333
}
}
filter {
if [type] == "apache" {
grok {
# See the following URL for a complete list of named patterns
# logstash/grok ships with by default:
# https://github.com/logstash/logstash/tree/master/patterns
#
# The grok filter will use the below pattern and on successful match use
# any captured values as new fields in the event.
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
# Try to pull the timestamp from the 'timestamp' field (parsed above with
# grok). The apache time format looks like: "18/Aug/2011:05:44:34 -0700"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output {
# Use stdout in debug mode again to see what logstash makes of the event.
stdout {
codec => rubydebug
}
}

View file

@ -1 +0,0 @@
129.92.249.70 - - [18/Aug/2011:06:00:14 -0700] "GET /style2.css HTTP/1.1" 200 1820 "http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"

View file

@ -1,25 +0,0 @@
input {
stdin {
# A type is a label applied to an event. It is used later with filters
# to restrict what filters are run against each event.
type => "human"
}
}
output {
# Print each event to stdout.
stdout {
# Enabling 'rubydebug' codec on the stdout output will make logstash
# pretty-print the entire event as something similar to a JSON representation.
codec => rubydebug
}
# You can have multiple outputs. All events generally to all outputs.
# Output events to elasticsearch
elasticsearch {
# Setting 'embedded' will run a real elasticsearch server inside logstash.
# This option below saves you from having to run a separate process just
# for ElasticSearch, so you can get started quicker!
embedded => true
}
}

View file

@ -1,16 +0,0 @@
input {
stdin {
# A type is a label applied to an event. It is used later with filters
# to restrict what filters are run against each event.
type => "human"
}
}
output {
# Print each event to stdout.
stdout {
# Enabling 'rubydebug' codec on the stdout output will make logstash
# pretty-print the entire event as something similar to a JSON representation.
codec => rubydebug
}
}

View file

@ -1,101 +0,0 @@
---
title: Logstash 10-Minute Tutorial
layout: content_right
---
# Logstash 10-minute Tutorial
## Step 1 - Download
### Download logstash:
* [logstash-%VERSION%.tar.gz](https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz)
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
### Unpack it
tar -xzf logstash-%VERSION%.tar.gz
cd logstash-%VERSION%
### Requirements:
* Java
### The Secret:
Logstash is written in JRuby, but I release standalone jar files for easy
deployment, so you don't need to download JRuby or most any other dependencies.
I bake as much as possible into the single release file.
## Step 2 - A hello world.
### Download this config file:
* [hello.conf](hello.conf)
### Run it:
bin/logstash agent -f hello.conf
Type stuff on standard input. Press enter. Watch what event Logstash sees.
Press ^C to kill it.
## Step 3 - Add ElasticSearch
### Download this config file:
* [hello-search.conf](hello-search.conf)
### Run it:
bin/logstash agent -f hello-search.conf
Same config as step 2, but now we are also writing events to ElasticSearch. Do
a search for `*` (all):
curl 'http://localhost:9200/_search?pretty=1&q=*'
### Download
* [apache-parse.conf](apache-parse.conf)
* [apache_log.1](apache_log.1) (a single apache log line)
### Run it
bin/logstash agent -f apache-parse.conf
Logstash will now be listening on TCP port 3333. Send an Apache log message at it:
nc localhost 3333 < apache_log.1
The expected output can be viewed here: [step-5-output.txt](step-5-output.txt)
## Step 6 - real world example + search
Same as the previous step, but we'll output to ElasticSearch now.
### Download
* [apache-elasticsearch.conf](apache-elasticsearch.conf)
* [apache_log.2.bz2](apache_log.2.bz2) (2 days of apache logs)
### Run it
bin/logstash agent -f apache-elasticsearch.conf
Logstash should be all set for you now. Start feeding it logs:
bzip2 -d apache_log.2.bz2
nc localhost 3333 < apache_log.2
## Want more?
For further learning, try these:
* [Watch a presentation on logstash](http://www.youtube.com/embed/RuUFnog29M4)
* [Getting started 'standalone' guide](http://logstash.net/docs/%VERSION%/tutorials/getting-started-simple)
* [Getting started 'centralized' guide](http://logstash.net/docs/%VERSION%/tutorials/getting-started-centralized) -
learn how to build out your logstash infrastructure and centralize your logs.
* [Dive into the docs](http://logstash.net/docs/%VERSION%/)

View file

@ -1,17 +0,0 @@
{
"type" => "apache",
"clientip" => "129.92.249.70",
"ident" => "-",
"auth" => "-",
"timestamp" => "18/Aug/2011:06:00:14 -0700",
"verb" => "GET",
"request" => "/style2.css",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "1820",
"referrer" => "http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html",
"agent" => "\"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\"",
"@timestamp" => "2011-08-18T13:00:14.000Z",
"host" => "127.0.0.1",
"message" => "129.92.249.70 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 1820 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\"\n"
}

View file

@ -1,436 +0,0 @@
= Getting Started with Logstash
== Introduction
Logstash is a tool for receiving, processing and outputting logs. All kinds of logs. System logs, webserver logs, error logs, application logs and just about anything you can throw at it. Sounds great, eh?
Using Elasticsearch as a backend datastore, and kibana as a frontend reporting tool, Logstash acts as the workhorse, creating a powerful pipeline for storing, querying and analyzing your logs. With an arsenal of built-in inputs, filters, codecs and outputs, you can harness some powerful functionality with a small amount of effort. So, let's get started!
=== Prerequisite: Java
The only prerequisite required by Logstash is a Java runtime. You can check that you have it installed by running the command `java -version` in your shell. Here's something similar to what you might see:
----
> java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
----
It is recommended to run a recent version of Java in order to ensure the greatest success in running Logstash.
It's fine to run an open-source version such as OpenJDK: +
http://openjdk.java.net/
Or you can use the official Oracle version: +
http://www.oracle.com/technetwork/java/index.html
Once you have verified the existence of Java on your system, we can move on!
== Up and Running!
=== Logstash in two commands
First, we're going to download the 'logstash' binary and run it with a very simple configuration.
----
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
----
Now you should have the file named 'logstash-%VERSION%.tar.gz' on your local filesystem. Let's unpack it:
----
tar zxvf logstash-%VERSION%.tar.gz
cd logstash-%VERSION%
----
Here, we are telling the *tar* command that we are sending it a gzipped file (*z* flag), that we would like to extract the file (*x* flag), that we would like to do so verbosely (*v* flag), and that we will provide a filename for *tar* (*f* flag).
Now let's run it:
----
bin/logstash -e 'input { stdin { } } output { stdout {} }'
----
Now type something into your command prompt, and you will see it output by Logstash:
----
hello world
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
----
OK, that's interesting... We ran Logstash with an input called "stdin", and an output named "stdout", and Logstash basically echoed back whatever we typed in some sort of structured format. Note that specifying the *-e* command line flag allows Logstash to accept a configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file between iterations.
Let's try a slightly fancier example. First, you should exit Logstash by issuing a 'CTRL-D' command (or 'CTRL-C Enter') in the shell in which it is running. Now run Logstash again with the following command:
----
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
----
And then try another test input, typing the text "goodnight moon":
----
goodnight moon
{
"message" => "goodnight moon",
"@timestamp" => "2013-11-20T23:48:05.335Z",
"@version" => "1",
"host" => "my-laptop"
}
----
So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of Logstash. By adding inputs, outputs and filters to your configuration, it's possible to massage the log data in many ways, in order to maximize flexibility of the stored data when you are querying it.
== Storing logs with Elasticsearch
Now, you're probably saying, "that's all fine and dandy, but typing all my logs into Logstash isn't really an option, and merely seeing them spit to STDOUT isn't very useful." Good point. First, let's set up Elasticsearch to store the messages we send into Logstash. If you don't have Elasticsearch already installed, you can http://www.elasticsearch.org/download/[download the RPM or DEB package], or install manually by downloading the current release tarball, by issuing the following four commands:
----
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
tar zxvf elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
cd elasticsearch-%ELASTICSEARCH_VERSION%/
./bin/elasticsearch
----
NOTE: This tutorial specifies running Logstash %VERSION% with Elasticsearch %ELASTICSEARCH_VERSION%. Each release of Logstash has a *recommended* version of Elasticsearch to pair with. Make sure the versions match based on the http://www.elasticsearch.org/overview/logstash[Logstash version] you're running!
More detailed information on installing and configuring Elasticsearch can be found on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html[The Elasticsearch reference pages]. However, for the purposes of Getting Started with Logstash, the default installation and configuration of Elasticsearch should be sufficient.
Now that we have Elasticsearch running on port 9200 (we do, right?), Logstash can be simply configured to use Elasticsearch as its backend. The defaults for both Logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional configurations within the elasticsearch output:
----
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
----
Type something, and Logstash will process it as before (this time you won't see any output, since we don't have the stdout output configured)
----
you know, for logs
----
You can confirm that ES actually received the data by making a curl request and inspecting the return:
----
curl 'http://localhost:9200/_search?pretty'
----
which should return something like this:
----
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.11.21",
"_type" : "logs",
"_id" : "2ijaoKqARqGvbMgP3BspJA",
"_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"}
} ]
}
}
----
Congratulations! You've successfully stashed logs in Elasticsearch via Logstash.
=== Elasticsearch Plugins (an aside)
Another very useful tool for querying your Logstash data (and Elasticsearch in general) is the Elasticsearch-kopf plugin. Here is more information on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html[Elasticsearch plugins]. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch directory (the same one in which you ran Elasticsearch earlier):
----
bin/plugin -install lmenezes/elasticsearch-kopf
----
Now you can browse to http://localhost:9200/_plugin/kopf[http://localhost:9200/_plugin/kopf] to browse your Elasticsearch data, settings and mappings!
=== Multiple Outputs
As a quick exercise in configuring multiple Logstash outputs, let's invoke Logstash again, using both the 'stdout' as well as the 'elasticsearch' output:
----
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
----
Typing a phrase will now echo back to your terminal, as well as save in Elasticsearch! (Feel free to verify this using curl, kibana or elasticsearch-kopf).
=== Default - Daily Indices
You might notice that Logstash was smart enough to create a new index in Elasticsearch... The default index name is in the form of 'logstash-YYYY.MM.DD', which essentially creates one index per day. At midnight (GMT?), Logstash will automagically rotate the index to a fresh new one, with the new current day's timestamp. This allows you to keep windows of data, based on how far retroactively you'd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able to query further into the past. If you'd like to simply delete old indices after a certain time period, you can use the https://github.com/elasticsearch/curator[Elasticsearch Curator tool].
== Moving On
Now you're ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features of Logstash, and how they interact with the Logstash engine.
=== The Life of an Event
Inputs, Outputs, Codecs and Filters are at the heart of the Logstash configuration. By creating a pipeline of event processing, Logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your data. To get you thinking about the various options available in Logstash, let's discuss some of the more common configurations currently in use. For more details, read about http://logstash.net/docs/latest/life-of-an-event[the Logstash event pipeline].
==== Inputs
Inputs are the mechanism for passing log data to Logstash. Some of the more useful, commonly-used ones are:
* *file*: reads from a file on the filesystem, much like the UNIX command "tail -0F"
* *syslog*: listens on the well-known port 514 for syslog messages and parses according to RFC3164 format
* *redis*: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized Logstash installation, which queues Logstash events from remote Logstash "shippers".
* *lumberjack*: processes events sent in the lumberjack protocol. Now called https://github.com/elasticsearch/logstash-forwarder[logstash-forwarder].
==== Filters
Filters are used as intermediary processing devices in the Logstash chain. They are often combined with conditionals in order to perform a certain action on an event, if it matches particular criteria. Some useful filters:
* *grok*: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to Logstash, it's more than likely you'll find one that meets your needs!
* *mutate*: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your events.
* *drop*: drop an event completely, for example, 'debug' events.
* *clone*: make a copy of an event, possibly adding or removing fields.
* *geoip*: adds information about geographical location of IP addresses (and displays amazing charts in kibana)
==== Outputs
Outputs are the final phase of the Logstash pipeline. An event may pass through multiple outputs during processing, but once all outputs are complete, the event has finished its execution. Some commonly used outputs include:
* *elasticsearch*: If you're planning to save your data in an efficient, convenient and easily queryable format... Elasticsearch is the way to go. Period. Yes, we're biased :)
* *file*: writes event data to a file on disk.
* *graphite*: sends event data to graphite, a popular open source tool for storing and graphing metrics. http://graphite.wikidot.com/
* *statsd*: a service which "listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services". If you're already using statsd, this could be useful for you!
==== Codecs
Codecs are basically stream filters which can operate as part of an input, or an output. Codecs allow you to easily separate the transport of your messages from the serialization process. Popular codecs include 'json', 'msgpack' and 'plain' (text).
* *json*: encode / decode data in JSON format
* *multiline*: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages
For the complete list of (current) configurations, visit the Logstash "plugin configuration" section of the http://www.elasticsearch.org/overview/logstash[Logstash documentation page].
== More fun with Logstash
=== Persistent Configuration files
Specifying configurations on the command line using '-e' is only so helpful, and more advanced setups will require more lengthy, long-lived configurations. First, let's create a simple configuration file, and invoke Logstash using it. Create a file named "logstash-simple.conf" and save it in the same directory as Logstash.
----
input { stdin { } }
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Then, run this command:
----
bin/logstash -f logstash-simple.conf
----
Et voilà! Logstash will read in the configuration file you just created and run as in the example we saw earlier. Note that we used the '-f' to read in the file, rather than the '-e' to read the configuration from the command line. This is a very simple case, of course, so let's move on to some more complex examples.
=== Testing Your Configuration Files
After creating a new or complex configuration file, it can be helpful to quickly test that the file is formatted correctly. We can verify our configuration file is formatted correctly by using the *--configtest* flag.
----
bin/logstash -f logstash-simple.conf --configtest
----
=== Filters
Filters are an in-line processing mechanism which provide the flexibility to slice and dice your data to fit your needs. Let's see one in action, namely the *grok filter*.
----
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Run Logstash with this configuration:
----
bin/logstash -f logstash-filter.conf
----
Now paste this line into the terminal (so it will be processed by the stdin input):
----
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
----
You should see something returned to STDOUT which looks like this:
----
{
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
"@timestamp" => "2013-12-11T08:01:45.000Z",
"@version" => "1",
"host" => "cadenza",
"clientip" => "127.0.0.1",
"ident" => "-",
"auth" => "-",
"timestamp" => "11/Dec/2013:00:01:45 -0800",
"verb" => "GET",
"request" => "/xampp/status.php",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "3891",
"referrer" => "\"http://cadenza/xampp/navi.php\"",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
}
----
As you can see, Logstash (with help from the *grok* filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying and analyzing our log data... for example, we'll be able to run reports on HTTP response codes, IP addresses, referrers, etc. very easily. There are quite a few grok patterns included with Logstash out-of-the-box, so it's quite likely if you're attempting to parse a fairly common log format, someone has already done the work for you. For more details, see the list of https://github.com/logstash/logstash/blob/master/patterns/grok-patterns[logstash grok patterns] on github.
The other filter used in this example is the *date* filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the @timestamp field in this example is set to December 11, 2013, even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example... the ability to tell Logstash "use this value as the timestamp for this event". For non-english installation you may have to precise the locale in date filter (locale => en).
== Useful Examples
=== Apache logs (from files)
Now, let's configure something actually *useful*... apache2 access log files! We are going to read the input from a file on the localhost, and use a *conditional* to process the event according to our needs. First, create a file called something like 'logstash-apache.conf' with the following contents (you'll need to change the log's file path to suit your needs):
----
input {
file {
path => "/tmp/access_log"
start_position => "beginning"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
}
stdout { codec => rubydebug }
}
----
Then, create the file you configured above (in this example, "/tmp/access_log") with the following log lines as contents (or use some from your own webserver):
----
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"
98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
----
Now run it with the -f flag as in the last example:
----
bin/logstash -f logstash-apache.conf
----
You should be able to see your apache log data in Elasticsearch now! You'll notice that Logstash opened the file you configured, and read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by Logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).
In this configuration, Logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching '*log'), by changing one line in the above configuration, like this:
----
input {
file {
path => "/tmp/*_log"
...
----
Now, rerun Logstash, and you will see both the error and access logs processed via Logstash. However, if you inspect your data (using elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. That's because we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can...
Also, you might have noticed that Logstash did not reprocess the events which were already seen in the access_log file. Logstash is able to save its position in files, only processing new lines as they are added to the file. Neat!
=== Conditionals
Now we can build on the previous example, where we introduced the concept of a *conditional*. A conditional should be familiar to most Logstash users, in the general sense. You may use 'if', 'else if' and 'else' statements, as in many other programming languages. Let's label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").
----
input {
file {
path => "/tmp/*_log"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
You'll notice we've labeled all events using the "type" field, but we didn't actually parse the "error" or "random" files... There are so many types of error logs that it's better left as an exercise for you, depending on the logs you're seeing.
=== Syslog
OK, now we can move on to another incredibly useful example: *syslog*. Syslog is one of the most common use cases for Logstash, and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164 :). Syslog is the de facto UNIX networked logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example, you won't need a functioning syslog instance; we'll fake it from the command line, so you can get a feel for what happens.
First, let's make a simple configuration file for Logstash + syslog, called 'logstash-syslog.conf'.
----
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
----
Run it as normal:
----
bin/logstash -f logstash-syslog.conf
----
Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. In this simplified case, we're simply going to telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell window to interact with the Logstash syslog input and type the following command:
----
telnet localhost 5000
----
You can copy and paste the following lines as samples (feel free to try some of your own, but keep in mind they might not parse if the grok filter is not correct for your data):
----
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]
Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' denied
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)
Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
----
Now you should see the output of Logstash in your original shell as it processes and parses messages!
----
{
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"@timestamp" => "2013-12-23T22:30:01.000Z",
"@version" => "1",
"type" => "syslog",
"host" => "0:0:0:0:0:0:0:1:52617",
"syslog_timestamp" => "Dec 23 14:30:01",
"syslog_hostname" => "louis",
"syslog_program" => "CRON",
"syslog_pid" => "619",
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"received_at" => "2013-12-23 22:49:22 UTC",
"received_from" => "0:0:0:0:0:0:0:1:52617",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}
----
Congratulations! You're well on your way to being a real Logstash power user. You should be comfortable configuring, running and sending events to Logstash, but there's much more to explore.

View file

@ -1,201 +0,0 @@
---
title: Just Enough RabbitMQ - logstash
layout: content_right
---
While configuring your RabbitMQ broker is out of scope for logstash, it's important
to understand how logstash uses RabbitMQ. To do that, we need to understand a
little about AMQP.
You should also consider reading
[this](http://www.rabbitmq.com/tutorials/amqp-concepts.html) at the RabbitMQ
website.
# Exchanges, queues and bindings; OH MY!
You can get a long way by understanding a few key terms.
## Exchanges
Exchanges are for message **producers**. In Logstash, we map these to
**outputs**. Logstash puts messages on exchanges. There are many types of
exchanges and they are discussed below.
## Queues
Queues are for message **consumers**. In Logstash, we map these to inputs.
Logstash reads messages from queues. Optionally, queues can consume only a
subset of messages. This is done with "routing keys".
## Bindings
Just having a producer and a consumer is not enough. We must `bind` a queue to
an exchange. When we bind a queue to an exchange, we can optionally provide a
routing key. Routing keys are discussed below.
## Broker
A broker is simply the AMQP server software. There are several brokers, but this
tutorial will cover the most common (and arguably popular), [RabbitMQ](http://www.rabbitmq.com).
# Routing Keys
Simply put, routing keys are somewhat like tags for messages. In practice, they
are hierarchical in nature with the each level separated by a dot:
- `messages.servers.production`
- `sports.atlanta.baseball`
- `company.myorg.mydepartment`
Routing keys are really handy with a tool like logstash where you
can programatically define the routing key for a given event using the metadata that logstash provides:
- `logs.servers.production.host1`
- `logs.servers.development.host1.syslog`
- `logs.servers.application_foo.critical`
From a consumer/queue perspective, routing keys also support two types wildcards - `#` and `*`.
- `*` (asterisk) matches any single word.
- `#` (hash) matches any number of words and behaves like a traditional wildcard.
Using the above examples, if you wanted to bind to an exchange and see messages
for just production, you would use the routing key `logs.servers.production.*`.
If you wanted to see messages for host1, regardless of environment you could
use `logs.servers.%.host1.#`.
Wildcards can be a bit confusing but a good general rule to follow is to use
`*` in places where you need wildcards for a known element. Use `#` when you
need to match any remaining placeholders. Note that wildcards in routing keys
only make sense on the consumer/queue binding, not in the publishing/exchange
side.
We'll get into some of that neat stuff below. For now, it's enough to
understand the general idea behind routing keys.
# Exchange types
There are three primary types of exchanges that you'll see.
## Direct
A direct exchange is one that is probably most familiar to people. Message
comes in and, assuming there is a queue bound, the message is picked up. You
can have multiple queues bound to the same direct exchange. The best way to
understand this pattern is pool of workers (queues) that read from a direct
exchange to get units of work. Only one consumer will see a given message in a
direct exchange.
You can set routing keys on messages published to a direct exchange. This
allows you do have workers that do different tasks read from the same global
pool of messages yet consume only the ones they know how to handle.
The RabbitMQ concepts guide (linked below) does a good job of describing this
visually
[here](http://www.rabbitmq.com/img/tutorials/intro/exchange-direct.png)
## Fanout
Fanouts are another type of exchange. Unlike direct exchanges, every queue
bound to a fanout exchange will see the same messages. This is best described
as a PUB/SUB pattern. This is helpful when you need broadcast messages to
multiple interested parties.
Fanout exchanges do NOT support routing keys. All bound queues see all
messages.
## Topic
Topic exchanges are special type of fanout exchange. Fanout exchanges don't
support routing keys. Topic exchanges do support them. Just like a fanout
exchange, all bound queues see all messages with the additional filter of the
routing key.
# RabbitMQ in logstash
As stated earlier, in Logstash, Outputs publish to Exchanges. Inputs read from
Queues that are bound to Exchanges. Logstash uses the `bunny` RabbitMQ library for
interaction with a broker. Logstash endeavors to expose as much of the
configuration for both exchanges and queues. There are many different tunables
that you might be concerned with setting - including things like message
durability or persistence of declared queues/exchanges. See the relevant input
and output documentation for RabbitMQ for a full list of tunables.
# Sample configurations, tips, tricks and gotchas
There are several examples in the logstash source directory of RabbitMQ usage,
however a few general rules might help eliminate any issues.
## Check your bindings
If logstash is publishing the messages and logstash is consuming the messages,
the `exchange` value for the input should match the `name` in the output.
sender agent
input { stdin { type = "test" } }
output {
rabbitmq {
exchange => "test_exchange"
host => "my_rabbitmq_server"
exchange_type => "fanout"
}
}
receiver agent
input {
rabbitmq {
queue => "test_queue"
host => "my_rabbitmq_server"
exchange => "test_exchange" # This matches the exchange declared above
}
}
output { stdout { debug => true }}
## Message persistence
By default, logstash will attempt to ensure that you don't lose any messages.
This is reflected in the RabbitMQ default settings as well. However there are
cases where you might not want this. A good example is where RabbitMQ is not your
primary method of shipping.
In the following example, we use RabbitMQ as a sniffing interface. Our primary
destination is the embedded ElasticSearch instance. We have a secondary RabbitMQ
output that we use for duplicating messages. However we disable persistence and
durability on this interface so that messages don't pile up waiting for
delivery. We only use RabbitMQ when we want to watch messages in realtime.
Additionally, we're going to leverage routing keys so that we can optionally
filter incoming messages to subsets of hosts. The exercise of getting messages
to this logstash agent are left up to the user.
input {
# some input definition here
}
output {
elasticsearch { embedded => true }
rabbitmq {
exchange => "logtail"
host => "my_rabbitmq_server"
exchange_type => "topic" # We use topic here to enable pub/sub with routing keys
key => "logs.%{host}"
durable => false # If rabbitmq restarts, the exchange disappears.
auto_delete => true # If logstash disconnects, the exchange goes away
persistent => false # Messages are not persisted to disk
}
}
Now if you want to stream logs in realtime, you can use the programming
language of your choice to bind a queue to the `logtail` exchange. If you do
not specify a routing key, you will see every message that comes in to
logstash. However, you can specify a routing key like `logs.apache1` and see
only messages from host `apache1`.
Note that any logstash variable is valid in the key definition. This allows you
to create really complex routing key hierarchies for advanced filtering.
Note that RabbitMQ has specific rules about durability and persistence matching
on both the queue and exchange. You should read the RabbitMQ documentation to
make sure you don't crash your RabbitMQ server with messages awaiting someone
to pick them up.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

View file

@ -1,84 +0,0 @@
---
title: Metrics from Logs - logstash
layout: content_right
---
# Pull metrics from logs
Logs are more than just text. How many customers signed up today? How many HTTP
errors happened this week? When was your last puppet run?
Apache logs give you the http response code and bytes sent - that's useful in a
graph. Metrics occur in logs so frequently there are piles of tools available to
help process them.
Logstash can help (and even replace some tools you might already be using).
## Example: Replacing Etsy's Logster
[Etsy](https://github.com/etsy) has some excellent open source tools. One of
them, [logster](https://github.com/etsy/logster), is meant to help you pull
metrics from logs and ship them to [graphite](http://graphite.wikidot.com/) so
you can make pretty graphs of those metrics.
One sample logster parser is one that pulls http response codes out of your
apache logs: [SampleLogster.py](https://github.com/etsy/logster/blob/master/logster/parsers/SampleLogster.py)
The above code is roughly 50 lines of python and only solves one specific
problem in only apache logs: count http response codes by major number (1xx,
2xx, 3xx, etc). To be completely fair, you could shrink the code required for
a Logster parser, but size is not strictly the point, here.
## Keep it simple
Logstash can do more than the above, simpler, and without much coding skill:
input {
file {
path => "/var/log/apache/access.log"
type => "apache-access"
}
}
filter {
grok {
type => "apache-access"
pattern => "%{COMBINEDAPACHELOG}"
}
}
output {
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}"
}
}
The above uses grok to parse fields out of apache logs and using the statsd
output to increment counters based on the response code. Of course, now that we
are parsing apache logs fully, we can trivially add additional metrics:
output {
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}"
# Use the 'bytes' field from the apache log as the count value.
count => [ "apache.bytes", "%{bytes}" ]
}
}
Now adding additional metrics is just one more line in your logstash config
file. BTW, the 'statsd' output writes to another Etsy tool,
[statsd](https://github.com/etsy/statsd), which helps build counters/latency
data and ship it to graphite for graphing.
Using the logstash config above and a bunch of apache access requests, you might end up
with a graph that looks like this:
![apache response codes graphed with graphite, fed data with logstash](media/frontend-response-codes.png)
The point made above is not "logstash is better than Logster" - the point is
that logstash is a general-purpose log management and pipelining tool and that
while you can centralize logs with logstash, you can read, modify, and write
them to and from just about anywhere.

View file

@ -1,118 +0,0 @@
---
title: ZeroMQ - logstash
layout: content_right
---
*ZeroMQ support in Logstash is currently in an experimental phase. As such, parts of this document are subject to change.*
# ZeroMQ
Simply put ZeroMQ (0mq) is a socket on steroids. This makes it a perfect compliment to Logstash - a pipe on steroids.
ZeroMQ allows you to easily create sockets of various types for moving data around. These sockets are refered to in ZeroMQ by the behavior of each side of the socket pair:
* PUSH/PULL
* REQ/REP
* PUB/SUB
* ROUTER/DEALER
There is also a `PAIR` socket type as well.
Additionally, the socket type is independent of the connection method. A PUB/SUB socket pair could have the SUB side of the socket be a listener and the PUB side a connecting client. This makes it very easy to fit ZeroMQ into various firewalled architectures.
Note that this is not a full-fledged tutorial on ZeroMQ. It is a tutorial on how Logstash uses ZeroMQ.
# ZeroMQ and logstash
In the spirit of ZeroMQ, Logstash takes these socket type pairs and uses them to create topologies with some very simply rules that make usage very easy to understand:
* The receiving end of a socket pair is always a logstash input
* The sending end of a socket pair is always a logstash output
* By default, inputs `bind`/listen and outputs `connect`
* Logstash refers to the socket pairs as topologies and mirrors the naming scheme from ZeroMQ
* By default, ZeroMQ inputs listen on all interfaces on port 2120, ZeroMQ outputs connect to `localhost` on port 2120
The currently understood Logstash topologies for ZeroMQ inputs and outputs are:
* `pushpull`
* `pubsub`
* `pair`
We have found from various discussions that these three topologies will cover most of user's needs. We hope to expose the full span of ZeroMQ socket types as time goes on.
By keeping the options simple, this allows you to get started VERY easily with what are normally complex message flows. No more confusion over `exchanges` and `queues` and `brokers`. If you need to add fanout capability to your flow, you can simply use the following configs:
* _node agent lives at 192.168.1.2_
* _indexer agent lives at 192.168.1.1_
# Node agent config
input { stdin { type => "test-stdin-input" } }
output { zeromq { topology => "pubsub" address => "tcp://192.168.1.1.:2120" } }
# Indexer agent config
input { zeromq { topology => "pubsub" } }
output { stdout { debug => true }}
If for some reason you need connections to initiate from the indexer because of firewall rules:
# Node agent config - now listening on all interfaces port 2120
input { stdin { type => "test-stdin-input" } }
output { zeromq { topology => "pubsub" address => "tcp://*.:2120" mode => "server" } }
# Indexer agent config
input { zeromq { topology => "pubsub" address => "tcp://192.168.1.2" mode => "client" } }
output { stdout { debug => true }}
As stated above, by default `inputs` always start as listeners and `outputs` always start as initiators. Please don't confuse what happens once the socket is connect with the direction of the connection. ZeroMQ separates connection from topology. In the second case of the above configs, once the two sockets are connected, regardless of who initiated the connection, the message flow itself is absolute. The indexer is reading events from the node.
# Which topology to use
The choice of topology can be broken down very easily based on need
## one to one
Use `pair` topology. On the output side, specify the ipaddress and port of the input side.
## broadcast
Use `pubsub`
If you need to broadcast ALL messages to multiple hosts that each need to see all events, use `pubsub`. Note that all events are broadcast to all subscribers. When using `pubsub` you might also want to investigate the `topic` configuration option which allows subscribers to see only a subset of messages.
## Filter workers
Use `pushpull`
In `pushpull`, ZeroMQ automatically load balances to all connected peers. This means that no peer sees the same message as any other peer.
# What's with the address format?
ZeroMQ supports multiple types of transports:
* inproc:// (unsupported by logstash due to threading)
* tcp:// (exactly what it sounds like)
* ipc:// (probably useless in logstash)
* pgm:// and epgm:// (a multicast format - only usable with PUB and SUB socket types)
For pretty much all cases, you'll be using `tcp://` transports with Logstash.
## Topic - applies to `pubsub`
This opt mimics the routing keys functionality in AMQP. Imagine you have a network of receivers but only a subset of the messages need to be seen by a subset of the hosts. You can use this option as a routing key to facilite that:
# This output is a PUB
output {
zeromq { topology => "pubsub" topic => "logs.production.%{host}" }
}
# This input is a SUB
# I only care about db1 logs
input { zeromq { type => "db1logs" address => "tcp://<ipaddress>:2120" topic => "logs.production.db1"}}
One thing important to note about 0mq PUBSUB and topics is that all filtering is done on the subscriber side. The subscriber will get ALL messages but discard any that don't match the topic.
Also important to note is that 0mq doesn't do topic in the same sense as an AMQP broker might. When a SUB socket gets a message, it compares the first bytes of the message against the topic. However, this isn't always flexible depending on the format of your message. The common practice then, is to send a 0mq multipart message and make the first part the topic. The next parts become the actual message body.
This is approach is how logstash handles this. When using PUBSUB, Logstash will send a multipart message where the first part is the name of the topic and the second part is the event. This is important to know if you are sending to a SUB input from sources other than Logstash.
# sockopts
Sockopts is not you choosing between blue or black socks. ZeroMQ supports setting various flags or options on sockets. In the interest of minimizing configuration syntax, these are _hidden_ behind a logstash configuration element called `sockopts`. You probably won't need to tune these for most cases. If you do need to tune them, you'll probably set the following:
## ZMQ::HWM - sets the high water mark
The high water mark is the maximum number of messages a given socket pair can have in its internal queue. Use this to throttle essentially.
## ZMQ::SWAP_SIZE
TODO
## ZMQ::IDENTITY
TODO