mirror of
https://github.com/elastic/logstash.git
synced 2025-04-24 14:47:19 -04:00
Single kv hashes look like Ruby hashes, and people get confused because they don't actually support commas, to separate each pair. This adds a more explicit example under the preferred multi-line hash example.
1084 lines
34 KiB
Text
1084 lines
34 KiB
Text
[[configuration]]
|
|
== Configuring Logstash
|
|
|
|
To configure Logstash, you create a config file that specifies which plugins you want to use and settings for each plugin.
|
|
You can reference event fields in a configuration and use conditionals to process events when they meet certain
|
|
criteria. When you run logstash, you use the `-f` to specify your config file.
|
|
|
|
Let's step through creating a simple config file and using it to run Logstash. Create a file named "logstash-simple.conf" and save it in the same directory as Logstash.
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input { stdin { } }
|
|
output {
|
|
elasticsearch { hosts => ["localhost:9200"] }
|
|
stdout { codec => rubydebug }
|
|
}
|
|
----------------------------------
|
|
|
|
Then, run logstash and specify the configuration file with the `-f` flag.
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
bin/logstash -f logstash-simple.conf
|
|
----------------------------------
|
|
|
|
Et voilà! Logstash reads the specified configuration file and outputs to both Elasticsearch and stdout. Before we
|
|
move on to some <<config-examples,more complex examples>>, let's take a closer look at what's in a config file.
|
|
|
|
[[configuration-file-structure]]
|
|
=== Structure of a Config File
|
|
|
|
A Logstash config file has a separate section for each type of plugin you want to add to the event processing pipeline. For example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
# This is a comment. You should use comments to describe
|
|
# parts of your configuration.
|
|
input {
|
|
...
|
|
}
|
|
|
|
filter {
|
|
...
|
|
}
|
|
|
|
output {
|
|
...
|
|
}
|
|
----------------------------------
|
|
|
|
Each section contains the configuration options for one or more plugins. If you specify
|
|
multiple filters, they are applied in the order of their appearance in the configuration file.
|
|
|
|
|
|
[float]
|
|
[[plugin_configuration]]
|
|
=== Plugin Configuration
|
|
|
|
The configuration of a plugin consists of the plugin name followed
|
|
by a block of settings for that plugin. For example, this input section configures two file inputs:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
input {
|
|
file {
|
|
path => "/var/log/messages"
|
|
type => "syslog"
|
|
}
|
|
|
|
file {
|
|
path => "/var/log/apache/access.log"
|
|
type => "apache"
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
In this example, two settings are configured for each of the file inputs: 'path' and 'type'.
|
|
|
|
The settings you can configure vary according to the plugin type. For information about each plugin, see <<input-plugins,Input Plugins>>, <<output-plugins, Output Plugins>>, <<filter-plugins,Filter Plugins>>, and <<codec-plugins,Codec Plugins>>.
|
|
|
|
[float]
|
|
[[plugin-value-types]]
|
|
=== Value Types
|
|
|
|
A plugin can require that the value for a setting be a
|
|
certain type, such as boolean, list, or hash. The following value
|
|
types are supported.
|
|
|
|
[[array]]
|
|
==== Array
|
|
|
|
This type is now mostly deprecated in favor of using a standard type like `string` with the plugin defining the `:list => true` property for better type checking. It is still needed to handle lists of hashes or mixed types where type checking is not desired.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
users => [ {id => 1, name => bob}, {id => 2, name => jane} ]
|
|
----------------------------------
|
|
|
|
[[list]]
|
|
[float]
|
|
==== Lists
|
|
|
|
Not a type in and of itself, but a property types can have.
|
|
This makes it possible to type check multiple values.
|
|
Plugin authors can enable list checking by specifying `:list => true` when declaring an argument.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
path => [ "/var/log/messages", "/var/log/*.log" ]
|
|
uris => [ "http://elastic.co", "http://example.net" ]
|
|
----------------------------------
|
|
|
|
This example configures `path`, which is a `string` to be a list that contains an element for each of the three strings. It also will configure the `uris` parameter to be a list of URIs, failing if any of the URIs provided are not valid.
|
|
|
|
|
|
[[boolean]]
|
|
[float]
|
|
==== Boolean
|
|
|
|
A boolean must be either `true` or `false`. Note that the `true` and `false` keywords
|
|
are not enclosed in quotes.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
ssl_enable => true
|
|
----------------------------------
|
|
|
|
[[bytes]]
|
|
[float]
|
|
==== Bytes
|
|
|
|
A bytes field is a string field that represents a valid unit of bytes. It is a
|
|
convenient way to declare specific sizes in your plugin options. Both SI (k M G T P E Z Y)
|
|
and Binary (Ki Mi Gi Ti Pi Ei Zi Yi) units are supported. Binary units are in
|
|
base-1024 and SI units are in base-1000. This field is case-insensitive
|
|
and accepts space between the value and the unit. If no unit is specified, the integer string
|
|
represents the number of bytes.
|
|
|
|
Examples:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
my_bytes => "1113" # 1113 bytes
|
|
my_bytes => "10MiB" # 10485760 bytes
|
|
my_bytes => "100kib" # 102400 bytes
|
|
my_bytes => "180 mb" # 180000000 bytes
|
|
----------------------------------
|
|
|
|
[[codec]]
|
|
[float]
|
|
==== Codec
|
|
|
|
A codec is the name of Logstash codec used to represent the data. Codecs can be
|
|
used in both inputs and outputs.
|
|
|
|
Input codecs provide a convenient way to decode your data before it enters the input.
|
|
Output codecs provide a convenient way to encode your data before it leaves the output.
|
|
Using an input or output codec eliminates the need for a separate filter in your Logstash pipeline.
|
|
|
|
A list of available codecs can be found at the <<codec-plugins,Codec Plugins>> page.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
codec => "json"
|
|
----------------------------------
|
|
|
|
[[hash]]
|
|
[float]
|
|
==== Hash
|
|
|
|
A hash is a collection of key value pairs specified in the format `"field1" => "value1"`.
|
|
Note that multiple key value entries are separated by spaces rather than commas.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
match => {
|
|
"field1" => "value1"
|
|
"field2" => "value2"
|
|
...
|
|
}
|
|
# or as a single line. No commas between entries:
|
|
match => { "field1" => "value1" "field2" => "value2" }
|
|
----------------------------------
|
|
|
|
[[number]]
|
|
[float]
|
|
==== Number
|
|
|
|
Numbers must be valid numeric values (floating point or integer).
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
port => 33
|
|
----------------------------------
|
|
|
|
[[password]]
|
|
[float]
|
|
==== Password
|
|
|
|
A password is a string with a single value that is not logged or printed.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
my_password => "password"
|
|
----------------------------------
|
|
|
|
[[uri]]
|
|
[float]
|
|
==== URI
|
|
|
|
A URI can be anything from a full URL like 'http://elastic.co/' to a simple identifier
|
|
like 'foobar'. If the URI contains a password such as 'http://user:pass@example.net' the password
|
|
portion of the URI will not be logged or printed.
|
|
|
|
Example:
|
|
[source,js]
|
|
----------------------------------
|
|
my_uri => "http://foo:bar@example.net"
|
|
----------------------------------
|
|
|
|
|
|
[[path]]
|
|
[float]
|
|
==== Path
|
|
|
|
A path is a string that represents a valid operating system path.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
my_path => "/tmp/logstash"
|
|
----------------------------------
|
|
|
|
[[string]]
|
|
[float]
|
|
==== String
|
|
|
|
A string must be a single character sequence. Note that string values are
|
|
enclosed in quotes, either double or single.
|
|
|
|
===== Escape Sequences
|
|
|
|
By default, escape sequences are not enabled. If you wish to use escape
|
|
sequences in quoted strings, you will need to set
|
|
`config.support_escapes: true` in your `logstash.yml`. When `true`, quoted
|
|
strings (double and single) will have this transformation:
|
|
|
|
|===========================
|
|
| Text | Result
|
|
| \r | carriage return (ASCII 13)
|
|
| \n | new line (ASCII 10)
|
|
| \t | tab (ASCII 9)
|
|
| \\ | backslash (ASCII 92)
|
|
| \" | double quote (ASCII 34)
|
|
| \' | single quote (ASCII 39)
|
|
|===========================
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
name => "Hello world"
|
|
name => 'It\'s a beautiful day'
|
|
----------------------------------
|
|
|
|
[float]
|
|
[[comments]]
|
|
=== Comments
|
|
|
|
Comments are the same as in perl, ruby, and python. A comment starts with a '#' character, and does not need to be at the beginning of a line. For example:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
# this is a comment
|
|
|
|
input { # comments can appear at the end of a line, too
|
|
# ...
|
|
}
|
|
----------------------------------
|
|
|
|
[[event-dependent-configuration]]
|
|
=== Accessing Event Data and Fields in the Configuration
|
|
|
|
The logstash agent is a processing pipeline with 3 stages: inputs -> filters ->
|
|
outputs. Inputs generate events, filters modify them, outputs ship them
|
|
elsewhere.
|
|
|
|
All events have properties. For example, an apache access log would have things
|
|
like status code (200, 404), request path ("/", "index.html"), HTTP verb
|
|
(GET, POST), client IP address, etc. Logstash calls these properties "fields."
|
|
|
|
Some of the configuration options in Logstash require the existence of fields in
|
|
order to function. Because inputs generate events, there are no fields to
|
|
evaluate within the input block--they do not exist yet!
|
|
|
|
Because of their dependency on events and fields, the following configuration
|
|
options will only work within filter and output blocks.
|
|
|
|
IMPORTANT: Field references, sprintf format and conditionals, described below,
|
|
will not work in an input block.
|
|
|
|
[float]
|
|
[[logstash-config-field-references]]
|
|
==== Field References
|
|
|
|
It is often useful to be able to refer to a field by name. To do this,
|
|
you can use the Logstash field reference syntax.
|
|
|
|
The syntax to access a field is `[fieldname]`. If you are referring to a
|
|
**top-level field**, you can omit the `[]` and simply use `fieldname`.
|
|
To refer to a **nested field**, you specify
|
|
the full path to that field: `[top-level field][nested field]`.
|
|
|
|
For example, the following event has five top-level fields (agent, ip, request, response,
|
|
ua) and three nested fields (status, bytes, os).
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
{
|
|
"agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
|
|
"ip": "192.168.24.44",
|
|
"request": "/index.html"
|
|
"response": {
|
|
"status": 200,
|
|
"bytes": 52353
|
|
},
|
|
"ua": {
|
|
"os": "Windows 7"
|
|
}
|
|
}
|
|
|
|
----------------------------------
|
|
|
|
To reference the `os` field, you specify `[ua][os]`. To reference a top-level
|
|
field such as `request`, you can simply specify the field name.
|
|
|
|
[float]
|
|
[[sprintf]]
|
|
==== sprintf format
|
|
|
|
The field reference format is also used in what Logstash calls 'sprintf format'. This format
|
|
enables you to refer to field values from within other strings. For example, the
|
|
statsd output has an 'increment' setting that enables you to keep a count of
|
|
apache logs by status code:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
output {
|
|
statsd {
|
|
increment => "apache.%{[response][status]}"
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
Similarly, you can convert the timestamp in the `@timestamp` field into a string. Instead of specifying a field name inside the curly braces, use the `+FORMAT` syntax where `FORMAT` is a http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html[time format].
|
|
|
|
For example, if you want to use the file output to write to logs based on the
|
|
event's date and hour and the `type` field:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
output {
|
|
file {
|
|
path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
[float]
|
|
[[conditionals]]
|
|
==== Conditionals
|
|
|
|
Sometimes you only want to filter or output an event under
|
|
certain conditions. For that, you can use a conditional.
|
|
|
|
Conditionals in Logstash look and act the same way they do in programming
|
|
languages. Conditionals support `if`, `else if` and `else` statements
|
|
and can be nested.
|
|
|
|
The conditional syntax is:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
if EXPRESSION {
|
|
...
|
|
} else if EXPRESSION {
|
|
...
|
|
} else {
|
|
...
|
|
}
|
|
----------------------------------
|
|
|
|
What's an expression? Comparison tests, boolean logic, and so on!
|
|
|
|
You can use the following comparison operators:
|
|
|
|
* equality: `==`, `!=`, `<`, `>`, `<=`, `>=`
|
|
* regexp: `=~`, `!~` (checks a pattern on the right against a string value on the left)
|
|
* inclusion: `in`, `not in`
|
|
|
|
The supported boolean operators are:
|
|
|
|
* `and`, `or`, `nand`, `xor`
|
|
|
|
The supported unary operators are:
|
|
|
|
* `!`
|
|
|
|
Expressions can be long and complex. Expressions can contain other expressions,
|
|
you can negate expressions with `!`, and you can group them with parentheses `(...)`.
|
|
|
|
For example, the following conditional uses the mutate filter to remove the field `secret` if the field
|
|
`action` has a value of `login`:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
filter {
|
|
if [action] == "login" {
|
|
mutate { remove_field => "secret" }
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
You can specify multiple expressions in a single condition:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
output {
|
|
# Send production errors to pagerduty
|
|
if [loglevel] == "ERROR" and [deployment] == "production" {
|
|
pagerduty {
|
|
...
|
|
}
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
You can use the `in` operator to test whether a field contains a specific string, key, or (for lists) element:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
filter {
|
|
if [foo] in [foobar] {
|
|
mutate { add_tag => "field in field" }
|
|
}
|
|
if [foo] in "foo" {
|
|
mutate { add_tag => "field in string" }
|
|
}
|
|
if "hello" in [greeting] {
|
|
mutate { add_tag => "string in field" }
|
|
}
|
|
if [foo] in ["hello", "world", "foo"] {
|
|
mutate { add_tag => "field in list" }
|
|
}
|
|
if [missing] in [alsomissing] {
|
|
mutate { add_tag => "shouldnotexist" }
|
|
}
|
|
if !("foo" in ["hello", "world"]) {
|
|
mutate { add_tag => "shouldexist" }
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
You use the `not in` conditional the same way. For example,
|
|
you could use `not in` to only route events to Elasticsearch
|
|
when `grok` is successful:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
output {
|
|
if "_grokparsefailure" not in [tags] {
|
|
elasticsearch { ... }
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
You can check for the existence of a specific field, but there's currently no way to differentiate between a field that
|
|
doesn't exist versus a field that's simply false. The expression `if [foo]` returns `false` when:
|
|
|
|
* `[foo]` doesn't exist in the event,
|
|
* `[foo]` exists in the event, but is false, or
|
|
* `[foo]` exists in the event, but is null
|
|
|
|
For more complex examples, see <<using-conditionals, Using Conditionals>>.
|
|
|
|
[float]
|
|
[[metadata]]
|
|
==== The @metadata field
|
|
|
|
In Logstash 1.5 and later, there is a special field called `@metadata`. The contents
|
|
of `@metadata` will not be part of any of your events at output time, which
|
|
makes it great to use for conditionals, or extending and building event fields
|
|
with field reference and sprintf formatting.
|
|
|
|
The following configuration file will yield events from STDIN. Whatever is
|
|
typed will become the `message` field in the event. The `mutate` events in the
|
|
filter block will add a few fields, some nested in the `@metadata` field.
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input { stdin { } }
|
|
|
|
filter {
|
|
mutate { add_field => { "show" => "This data will be in the output" } }
|
|
mutate { add_field => { "[@metadata][test]" => "Hello" } }
|
|
mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
|
|
}
|
|
|
|
output {
|
|
if [@metadata][test] == "Hello" {
|
|
stdout { codec => rubydebug }
|
|
}
|
|
}
|
|
|
|
----------------------------------
|
|
|
|
Let's see what comes out:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
|
|
$ bin/logstash -f ../test.conf
|
|
Pipeline main started
|
|
asdf
|
|
{
|
|
"@timestamp" => 2016-06-30T02:42:51.496Z,
|
|
"@version" => "1",
|
|
"host" => "example.com",
|
|
"show" => "This data will be in the output",
|
|
"message" => "asdf"
|
|
}
|
|
|
|
----------------------------------
|
|
|
|
The "asdf" typed in became the `message` field contents, and the conditional
|
|
successfully evaluated the contents of the `test` field nested within the
|
|
`@metadata` field. But the output did not show a field called `@metadata`, or
|
|
its contents.
|
|
|
|
The `rubydebug` codec allows you to reveal the contents of the `@metadata` field
|
|
if you add a config flag, `metadata => true`:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
stdout { codec => rubydebug { metadata => true } }
|
|
----------------------------------
|
|
|
|
Let's see what the output looks like with this change:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
$ bin/logstash -f ../test.conf
|
|
Pipeline main started
|
|
asdf
|
|
{
|
|
"@timestamp" => 2016-06-30T02:46:48.565Z,
|
|
"@metadata" => {
|
|
"test" => "Hello",
|
|
"no_show" => "This data will not be in the output"
|
|
},
|
|
"@version" => "1",
|
|
"host" => "example.com",
|
|
"show" => "This data will be in the output",
|
|
"message" => "asdf"
|
|
}
|
|
----------------------------------
|
|
|
|
Now you can see the `@metadata` field and its sub-fields.
|
|
|
|
IMPORTANT: Only the `rubydebug` codec allows you to show the contents of the
|
|
`@metadata` field.
|
|
|
|
Make use of the `@metadata` field any time you need a temporary field but do not
|
|
want it to be in the final output.
|
|
|
|
Perhaps one of the most common use cases for this new field is with the `date`
|
|
filter and having a temporary timestamp.
|
|
|
|
This configuration file has been simplified, but uses the timestamp format
|
|
common to Apache and Nginx web servers. In the past, you'd have to delete
|
|
the timestamp field yourself, after using it to overwrite the `@timestamp`
|
|
field. With the `@metadata` field, this is no longer necessary:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input { stdin { } }
|
|
|
|
filter {
|
|
grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] }
|
|
date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] }
|
|
}
|
|
|
|
output {
|
|
stdout { codec => rubydebug }
|
|
}
|
|
----------------------------------
|
|
|
|
Notice that this configuration puts the extracted date into the
|
|
`[@metadata][timestamp]` field in the `grok` filter. Let's feed this
|
|
configuration a sample date string and see what comes out:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
$ bin/logstash -f ../test.conf
|
|
Pipeline main started
|
|
02/Mar/2014:15:36:43 +0100
|
|
{
|
|
"@timestamp" => 2014-03-02T14:36:43.000Z,
|
|
"@version" => "1",
|
|
"host" => "example.com",
|
|
"message" => "02/Mar/2014:15:36:43 +0100"
|
|
}
|
|
----------------------------------
|
|
|
|
That's it! No extra fields in the output, and a cleaner config file because you
|
|
do not have to delete a "timestamp" field after conversion in the `date` filter.
|
|
|
|
Another use case is the CouchDB Changes input plugin (See
|
|
https://github.com/logstash-plugins/logstash-input-couchdb_changes).
|
|
This plugin automatically captures CouchDB document field metadata into the
|
|
`@metadata` field within the input plugin itself. When the events pass through
|
|
to be indexed by Elasticsearch, the Elasticsearch output plugin allows you to
|
|
specify the `action` (delete, update, insert, etc.) and the `document_id`, like
|
|
this:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
output {
|
|
elasticsearch {
|
|
action => "%{[@metadata][action]}"
|
|
document_id => "%{[@metadata][_id]}"
|
|
hosts => ["example.com"]
|
|
index => "index_name"
|
|
protocol => "http"
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
[[environment-variables]]
|
|
=== Using Environment Variables in the Configuration
|
|
|
|
==== Overview
|
|
|
|
* You can set environment variable references in the configuration for Logstash plugins by using `${var}`.
|
|
* At Logstash startup, each reference will be replaced by the value of the environment variable.
|
|
* The replacement is case-sensitive.
|
|
* References to undefined variables raise a Logstash configuration error.
|
|
* You can give a default value by using the form `${var:default value}`. Logstash uses the default value if the
|
|
environment variable is undefined.
|
|
* You can add environment variable references in any plugin option type : string, number, boolean, array, or hash.
|
|
* Environment variables are immutable. If you update the environment variable, you'll have to restart Logstash to pick up the updated value.
|
|
|
|
==== Examples
|
|
|
|
The following examples show you how to use environment variables to set the values of some commonly used
|
|
configuration options.
|
|
|
|
===== Setting the TCP Port
|
|
|
|
Here's an example that uses an environment variable to set the TCP port:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input {
|
|
tcp {
|
|
port => "${TCP_PORT}"
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
Now let's set the value of `TCP_PORT`:
|
|
|
|
[source,shell]
|
|
----
|
|
export TCP_PORT=12345
|
|
----
|
|
|
|
At startup, Logstash uses the following configuration:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input {
|
|
tcp {
|
|
port => 12345
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
If the `TCP_PORT` environment variable is not set, Logstash returns a configuration error.
|
|
|
|
You can fix this problem by specifying a default value:
|
|
|
|
[source,ruby]
|
|
----
|
|
input {
|
|
tcp {
|
|
port => "${TCP_PORT:54321}"
|
|
}
|
|
}
|
|
----
|
|
|
|
Now, instead of returning a configuration error if the variable is undefined, Logstash uses the default:
|
|
|
|
[source,ruby]
|
|
----
|
|
input {
|
|
tcp {
|
|
port => 54321
|
|
}
|
|
}
|
|
----
|
|
|
|
If the environment variable is defined, Logstash uses the value specified for the variable instead of the default.
|
|
|
|
===== Setting the Value of a Tag
|
|
|
|
Here's an example that uses an environment variable to set the value of a tag:
|
|
|
|
[source,ruby]
|
|
----
|
|
filter {
|
|
mutate {
|
|
add_tag => [ "tag1", "${ENV_TAG}" ]
|
|
}
|
|
}
|
|
----
|
|
|
|
Let's set the value of `ENV_TAG`:
|
|
|
|
[source,shell]
|
|
----
|
|
export ENV_TAG="tag2"
|
|
----
|
|
|
|
At startup, Logstash uses the following configuration:
|
|
|
|
[source,ruby]
|
|
----
|
|
filter {
|
|
mutate {
|
|
add_tag => [ "tag1", "tag2" ]
|
|
}
|
|
}
|
|
----
|
|
|
|
===== Setting a File Path
|
|
|
|
Here's an example that uses an environment variable to set the path to a log file:
|
|
|
|
[source,ruby]
|
|
----
|
|
filter {
|
|
mutate {
|
|
add_field => {
|
|
"my_path" => "${HOME}/file.log"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
Let's set the value of `HOME`:
|
|
|
|
[source,shell]
|
|
----
|
|
export HOME="/path"
|
|
----
|
|
|
|
At startup, Logstash uses the following configuration:
|
|
|
|
[source,ruby]
|
|
----
|
|
filter {
|
|
mutate {
|
|
add_field => {
|
|
"my_path" => "/path/file.log"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
|
|
[[config-examples]]
|
|
=== Logstash Configuration Examples
|
|
The following examples illustrate how you can configure Logstash to filter events, process Apache logs and syslog messages, and use conditionals to control what events are processed by a filter or output.
|
|
|
|
TIP: If you need help building grok patterns, try out the
|
|
{kibana-ref}/xpack-grokdebugger.html[Grok Debugger]. The Grok Debugger is an
|
|
{xpack} feature under the Basic License and is therefore *free to use*.
|
|
|
|
[float]
|
|
[[filter-example]]
|
|
==== Configuring Filters
|
|
Filters are an in-line processing mechanism that provide the flexibility to slice and dice your data to fit your needs. Let's take a look at some filters in action. The following configuration file sets up the `grok` and `date` filters.
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input { stdin { } }
|
|
|
|
filter {
|
|
grok {
|
|
match => { "message" => "%{COMBINEDAPACHELOG}" }
|
|
}
|
|
date {
|
|
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
|
|
}
|
|
}
|
|
|
|
output {
|
|
elasticsearch { hosts => ["localhost:9200"] }
|
|
stdout { codec => rubydebug }
|
|
}
|
|
----------------------------------
|
|
|
|
Run Logstash with this configuration:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
bin/logstash -f logstash-filter.conf
|
|
----------------------------------
|
|
|
|
Now, paste the following line into your terminal and press Enter so it will be
|
|
processed by the stdin input:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
|
|
----------------------------------
|
|
|
|
You should see something returned to stdout that looks like this:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
{
|
|
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
|
|
"@timestamp" => "2013-12-11T08:01:45.000Z",
|
|
"@version" => "1",
|
|
"host" => "cadenza",
|
|
"clientip" => "127.0.0.1",
|
|
"ident" => "-",
|
|
"auth" => "-",
|
|
"timestamp" => "11/Dec/2013:00:01:45 -0800",
|
|
"verb" => "GET",
|
|
"request" => "/xampp/status.php",
|
|
"httpversion" => "1.1",
|
|
"response" => "200",
|
|
"bytes" => "3891",
|
|
"referrer" => "\"http://cadenza/xampp/navi.php\"",
|
|
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
|
|
}
|
|
----------------------------------
|
|
|
|
As you can see, Logstash (with help from the `grok` filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This is extremely useful once you start querying and analyzing our log data. For example, you'll be able to easily run reports on HTTP response codes, IP addresses, referrers, and so on. There are quite a few grok patterns included with Logstash out-of-the-box, so it's quite likely if you need to parse a common log format, someone has already done the work for you. For more information, see the list of https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns[Logstash grok patterns] on GitHub.
|
|
|
|
The other filter used in this example is the `date` filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the `@timestamp` field in this example is set to December 11, 2013, even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs. It gives you the ability to tell Logstash "use this value as the timestamp for this event".
|
|
|
|
[float]
|
|
==== Processing Apache Logs
|
|
Let's do something that's actually *useful*: process apache2 access log files! We are going to read the input from a file on the localhost, and use a <<conditionals,conditional>> to process the event according to our needs. First, create a file called something like 'logstash-apache.conf' with the following contents (you can change the log's file path to suit your needs):
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
input {
|
|
file {
|
|
path => "/tmp/access_log"
|
|
start_position => "beginning"
|
|
}
|
|
}
|
|
|
|
filter {
|
|
if [path] =~ "access" {
|
|
mutate { replace => { "type" => "apache_access" } }
|
|
grok {
|
|
match => { "message" => "%{COMBINEDAPACHELOG}" }
|
|
}
|
|
}
|
|
date {
|
|
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
|
|
}
|
|
}
|
|
|
|
output {
|
|
elasticsearch {
|
|
hosts => ["localhost:9200"]
|
|
}
|
|
stdout { codec => rubydebug }
|
|
}
|
|
|
|
----------------------------------
|
|
|
|
Then, create the input file you configured above (in this example, "/tmp/access_log") with the following log entries (or use some from your own webserver):
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
|
|
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"
|
|
98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
|
|
----------------------------------
|
|
|
|
Now, run Logstash with the -f flag to pass in the configuration file:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
bin/logstash -f logstash-apache.conf
|
|
----------------------------------
|
|
|
|
Now you should see your apache log data in Elasticsearch! Logstash opened and read the specified input file, processing each event it encountered. Any additional lines logged to this file will also be captured, processed by Logstash as events, and stored in Elasticsearch. As an added bonus, they are stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).
|
|
|
|
In this configuration, Logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching `*log`), by changing one line in the above configuration:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
input {
|
|
file {
|
|
path => "/tmp/*_log"
|
|
...
|
|
----------------------------------
|
|
|
|
When you restart Logstash, it will process both the error and access logs. However, if you inspect your data (using elasticsearch-kopf, perhaps), you'll see that the access_log is broken up into discrete fields, but the error_log isn't. That's because we used a `grok` filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can...
|
|
|
|
Note that Logstash did not reprocess the events that were already seen in the access_log file. When reading from a file, Logstash saves its position and only processes new lines as they are added. Neat!
|
|
|
|
[float]
|
|
[[using-conditionals]]
|
|
==== Using Conditionals
|
|
You use conditionals to control what events are processed by a filter or output. For example, you could label each event according to which file it appeared in (access_log, error_log, and other random files that end with "log").
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input {
|
|
file {
|
|
path => "/tmp/*_log"
|
|
}
|
|
}
|
|
|
|
filter {
|
|
if [path] =~ "access" {
|
|
mutate { replace => { type => "apache_access" } }
|
|
grok {
|
|
match => { "message" => "%{COMBINEDAPACHELOG}" }
|
|
}
|
|
date {
|
|
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
|
|
}
|
|
} else if [path] =~ "error" {
|
|
mutate { replace => { type => "apache_error" } }
|
|
} else {
|
|
mutate { replace => { type => "random_logs" } }
|
|
}
|
|
}
|
|
|
|
output {
|
|
elasticsearch { hosts => ["localhost:9200"] }
|
|
stdout { codec => rubydebug }
|
|
}
|
|
----------------------------------
|
|
|
|
This example labels all events using the `type` field, but doesn't actually parse the `error` or `random` files. There are so many types of error logs that how they should be labeled really depends on what logs you're working with.
|
|
|
|
Similarly, you can use conditionals to direct events to particular outputs. For example, you could:
|
|
|
|
* alert nagios of any apache events with status 5xx
|
|
* record any 4xx status to Elasticsearch
|
|
* record all status code hits via statsd
|
|
|
|
To tell nagios about any http event that has a 5xx status code, you
|
|
first need to check the value of the `type` field. If it's apache, then you can
|
|
check to see if the `status` field contains a 5xx error. If it is, send it to nagios. If it isn't
|
|
a 5xx error, check to see if the `status` field contains a 4xx error. If so, send it to Elasticsearch.
|
|
Finally, send all apache status codes to statsd no matter what the `status` field contains:
|
|
|
|
[source,js]
|
|
----------------------------------
|
|
output {
|
|
if [type] == "apache" {
|
|
if [status] =~ /^5\d\d/ {
|
|
nagios { ... }
|
|
} else if [status] =~ /^4\d\d/ {
|
|
elasticsearch { ... }
|
|
}
|
|
statsd { increment => "apache.%{status}" }
|
|
}
|
|
}
|
|
----------------------------------
|
|
|
|
[float]
|
|
==== Processing Syslog Messages
|
|
Syslog is one of the most common use cases for Logstash, and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164). Syslog is the de facto UNIX networked logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example, you won't need a functioning syslog instance; we'll fake it from the command line so you can get a feel for what happens.
|
|
|
|
First, let's make a simple configuration file for Logstash + syslog, called 'logstash-syslog.conf'.
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
input {
|
|
tcp {
|
|
port => 5000
|
|
type => syslog
|
|
}
|
|
udp {
|
|
port => 5000
|
|
type => syslog
|
|
}
|
|
}
|
|
|
|
filter {
|
|
if [type] == "syslog" {
|
|
grok {
|
|
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
|
|
add_field => [ "received_at", "%{@timestamp}" ]
|
|
add_field => [ "received_from", "%{host}" ]
|
|
}
|
|
date {
|
|
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
|
|
}
|
|
}
|
|
}
|
|
|
|
output {
|
|
elasticsearch { hosts => ["localhost:9200"] }
|
|
stdout { codec => rubydebug }
|
|
}
|
|
----------------------------------
|
|
|
|
Run Logstash with this new configuration:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
bin/logstash -f logstash-syslog.conf
|
|
----------------------------------
|
|
|
|
Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. For this example, we'll just telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). Open another shell window to interact with the Logstash syslog input and enter the following command:
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
telnet localhost 5000
|
|
----------------------------------
|
|
|
|
Copy and paste the following lines as samples. (Feel free to try some of your own, but keep in mind they might not parse if the `grok` filter is not correct for your data).
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]
|
|
Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' denied
|
|
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)
|
|
Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
|
|
----------------------------------
|
|
|
|
Now you should see the output of Logstash in your original shell as it processes and parses messages!
|
|
|
|
[source,ruby]
|
|
----------------------------------
|
|
{
|
|
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
|
|
"@timestamp" => "2013-12-23T22:30:01.000Z",
|
|
"@version" => "1",
|
|
"type" => "syslog",
|
|
"host" => "0:0:0:0:0:0:0:1:52617",
|
|
"syslog_timestamp" => "Dec 23 14:30:01",
|
|
"syslog_hostname" => "louis",
|
|
"syslog_program" => "CRON",
|
|
"syslog_pid" => "619",
|
|
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
|
|
"received_at" => "2013-12-23 22:49:22 UTC",
|
|
"received_from" => "0:0:0:0:0:0:0:1:52617",
|
|
"syslog_severity_code" => 5,
|
|
"syslog_facility_code" => 1,
|
|
"syslog_facility" => "user-level",
|
|
"syslog_severity" => "notice"
|
|
}
|
|
----------------------------------
|