mirror of
https://github.com/elastic/logstash.git
synced 2025-04-24 06:37:19 -04:00
Updates master branch with new tutorial and Getting Started
This commit is contained in:
parent
30f435e911
commit
4e8bd0f15a
3 changed files with 536 additions and 196 deletions
500
docs/asciidoc/static/advanced-pipeline.asciidoc
Normal file
500
docs/asciidoc/static/advanced-pipeline.asciidoc
Normal file
|
@ -0,0 +1,500 @@
|
|||
[[advanced-pipeline]]
|
||||
== Setting Up an Advanced Logstash Pipeline
|
||||
|
||||
A Logstash pipeline in most use cases has one or more input, filter, and output plugins. The scenarios in this section
|
||||
build Logstash configuration files to specify these plugins and discuss what each plugin is doing.
|
||||
|
||||
The Logstash configuration file defines your _Logstash pipeline_. When you start a Logstash instance, use the
|
||||
`-f <path/to/file>` option to specify the configuration file that defines that instance’s pipeline.
|
||||
|
||||
A Logstash pipeline has two required elements, `input` and `output`, and one optional element, `filter`. The input
|
||||
plugins consume data from a source, the filter plugins modify the data as you specify, and the output plugins write
|
||||
the data to a destination.
|
||||
|
||||
image::static/images/basic_logstash_pipeline.png[]
|
||||
|
||||
The following text represents the skeleton of a configuration pipeline:
|
||||
|
||||
[source,shell]
|
||||
# The # character at the beginning of a line indicates a comment. Use
|
||||
# comments to describe your configuration.
|
||||
input {
|
||||
}
|
||||
# The filter part of this file is commented out to indicate that it is
|
||||
# optional.
|
||||
# filter {
|
||||
#
|
||||
# }
|
||||
output {
|
||||
}
|
||||
|
||||
This skeleton is non-functional, because the input and output sections don’t have any valid options defined. The
|
||||
examples in this tutorial build configuration files to address specific use cases.
|
||||
|
||||
Paste the skeleton into a file named `first-pipeline.conf` in your home Logstash directory.
|
||||
|
||||
[[parsing-into-es]]
|
||||
=== Parsing Apache Logs into Elasticsearch
|
||||
|
||||
This example creates a Logstash pipeline that takes Apache web logs as input, parses those logs to create specific,
|
||||
named fields from the logs, and writes the parsed data to an Elasticsearch cluster.
|
||||
|
||||
You can download the sample data set used in this example http://tbd.co/groksample.log[here]. Unpack this file.
|
||||
|
||||
[float]
|
||||
[[configuring-file-input]]
|
||||
==== Configuring Logstash for File Input
|
||||
|
||||
To start your Logstash pipeline, configure the Logstash instance to read from a file using the
|
||||
{logstash}plugins-inputs-file.html[file] input plugin.
|
||||
|
||||
Edit the `first-pipeline.conf` file to add the following text:
|
||||
|
||||
[source,json]
|
||||
input {
|
||||
file {
|
||||
path => "/path/to/groksample.log"
|
||||
start_position => beginning <1>
|
||||
}
|
||||
}
|
||||
|
||||
<1> The default behavior of the file input plugin is to monitor a file for new information, in a manner similar to the
|
||||
UNIX `tail -f` command. To change this default behavior and process the entire file, we need to specify the position
|
||||
where Logstash starts processing the file.
|
||||
|
||||
Replace `/path/to/` with the actual path to the location of `groksample.log` in your file system.
|
||||
|
||||
[float]
|
||||
[[configuring-grok-filter]]
|
||||
==== Parsing Web Logs with the Grok Filter Plugin
|
||||
|
||||
The {logstash}plugin-filters-grok[`grok`] filter plugin is one of several plugins that are available by default in
|
||||
Logstash. For details on how to manage Logstash plugins, see the <<working-with-plugins,reference documentation>> for
|
||||
the plugin manager.
|
||||
|
||||
Because the `grok` filter plugin looks for patterns in the incoming log data, configuration requires you to make
|
||||
decisions about how to identify the patterns that are of interest to your use case. A representative line from the web
|
||||
server log sample looks like this:
|
||||
|
||||
[source,shell]
|
||||
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
|
||||
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
|
||||
Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
|
||||
|
||||
The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. In this tutorial, use
|
||||
the `%{COMBINEDAPACHELOG}` grok pattern, which structures lines from the Apache log using the following schema:
|
||||
|
||||
[horizontal]
|
||||
*Information*:: *Field Name*
|
||||
IP Address:: `clientip`
|
||||
User ID:: `ident`
|
||||
User Authentication:: `auth`
|
||||
timestamp:: `timestamp`
|
||||
HTTP Verb:: `verb`
|
||||
Request body:: `request`
|
||||
HTTP Version:: `httpversion`
|
||||
HTTP Status Code:: `response`
|
||||
Bytes served:: `bytes`
|
||||
Referrer URL:: `referrer`
|
||||
User agent:: `agent`
|
||||
|
||||
Edit the `first-pipeline.conf` file to add the following text:
|
||||
|
||||
[source,json]
|
||||
filter {
|
||||
grok {
|
||||
match => { "message" => "%{COMBINEDAPACHELOG}"}
|
||||
}
|
||||
}
|
||||
|
||||
After processing, the sample line has the following JSON representation:
|
||||
|
||||
[source,json]
|
||||
{
|
||||
"clientip" : "83.149.9.216",
|
||||
"ident" : ,
|
||||
"auth" : ,
|
||||
"timestamp" : "04/Jan/2015:05:13:42 +0000",
|
||||
"verb" : "GET",
|
||||
"request" : "/presentations/logstash-monitorama-2013/images/kibana-search.png",
|
||||
"httpversion" : "HTTP/1.1",
|
||||
"response" : "200",
|
||||
"bytes" : "203023",
|
||||
"referrer" : "http://semicomplete.com/presentations/logstash-monitorama-2013/",
|
||||
"agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
|
||||
}
|
||||
|
||||
[float]
|
||||
[[indexing-parsed-data-into-elasticsearch]]
|
||||
==== Indexing Parsed Data into Elasticsearch
|
||||
|
||||
Now that the web logs are broken down into specific fields, the Logstash pipeline can index the data into an
|
||||
Elasticsearch cluster. Edit the `first-pipeline.conf` file to add the following text after the `input` section:
|
||||
|
||||
[source,json]
|
||||
output {
|
||||
elasticsearch {
|
||||
protocol => "http"
|
||||
}
|
||||
}
|
||||
|
||||
With this configuration, Logstash uses multicast discovery to connect to Elasticsearch.
|
||||
|
||||
NOTE: Multicast discovery is acceptable for development work, but unsuited for production environments. For the
|
||||
purposes of this example, however, the default behavior is sufficient.
|
||||
|
||||
[float]
|
||||
[[configuring-geoip-plugin]]
|
||||
==== Enhancing Your Data with the Geoip Filter Plugin
|
||||
|
||||
In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing
|
||||
data. As an example, the {logstash}plugins-filters-geoip.html[`geoip`] plugin looks up IP addresses, derives geographic
|
||||
location information from the addresses, and adds that location information to the logs.
|
||||
|
||||
Configure your Logstash instance to use the `geoip` filter plugin by adding the following lines to the `filter` section
|
||||
of the `first-pipeline.conf` file:
|
||||
|
||||
[source,json]
|
||||
geoip {
|
||||
source => "clientip"
|
||||
}
|
||||
|
||||
The `geoip` plugin configuration requires data that is already defined as separate fields. Make sure that the `geoip`
|
||||
section is after the `grok` section of the configuration file.
|
||||
|
||||
Specify the name of the field that contains the IP address to look up. In this tutorial, the field name is `clientip`.
|
||||
|
||||
[float]
|
||||
[[testing-initial-pipeline]]
|
||||
==== Testing Your Initial Pipeline
|
||||
|
||||
At this point, your `first-pipeline.conf` file has input, filter, and output sections properly configured, and looks
|
||||
like this:
|
||||
|
||||
[source,shell]
|
||||
input {
|
||||
file {
|
||||
path => "/Users/palecur/logstash-1.5.2/logstash-tutorial-dataset"
|
||||
start_position => beginning
|
||||
}
|
||||
}
|
||||
filter {
|
||||
grok {
|
||||
match => { "message" => "%{COMBINEDAPACHELOG}"}
|
||||
}
|
||||
geoip {
|
||||
source => "clientip"
|
||||
}
|
||||
}
|
||||
output {
|
||||
elasticsearch {
|
||||
protocol => "http"
|
||||
}
|
||||
stdout {}
|
||||
}
|
||||
|
||||
To verify your configuration, run the following command:
|
||||
|
||||
[source,shell]
|
||||
bin/logstash -f first-pipeline.conf --configtest
|
||||
|
||||
The `--configtest` option parses your configuration file and reports any errors. When the configuration file passes
|
||||
the configuration test, start Logstash with the following command:
|
||||
|
||||
[source,shell]
|
||||
bin/logstash -f first-pipeline.conf
|
||||
|
||||
Try a test query to Elasticsearch based on the fields created by the `grok` filter plugin:
|
||||
|
||||
[source,shell]
|
||||
curl -XGET 'localhost:9200/logstash-$DATE/_search?q=response=401'
|
||||
|
||||
Replace $DATE with the current date, in YYYY.MM.DD format.
|
||||
|
||||
Since our sample has just one 401 HTTP response, we get one hit back:
|
||||
|
||||
[source,shell]
|
||||
{"took":2,
|
||||
"timed_out":false,
|
||||
"_shards":{"total":5,
|
||||
"successful":5,
|
||||
"failed":0},
|
||||
"hits":{"total":1,
|
||||
"max_score":1.5351382,
|
||||
"hits":[{"_index":"logstash-2015.07.30",
|
||||
"_type":"logs",
|
||||
"_id":"AU7gqOky1um3U6ZomFaF",
|
||||
"_score":1.5351382,
|
||||
"_source":{"message":"83.149.9.216 - - [04/Jan/2015:05:13:45 +0000] \"GET /presentations/logstash-monitorama-2013/images/frontend-response-codes.png HTTP/1.1\" 200 52878 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
|
||||
"@version":"1",
|
||||
"@timestamp":"2015-07-30T20:30:41.265Z",
|
||||
"host":"localhost",
|
||||
"path":"/path/to/logstash-tutorial-dataset",
|
||||
"clientip":"83.149.9.216",
|
||||
"ident":"-",
|
||||
"auth":"-",
|
||||
"timestamp":"04/Jan/2015:05:13:45 +0000",
|
||||
"verb":"GET",
|
||||
"request":"/presentations/logstash-monitorama-2013/images/frontend-response-codes.png",
|
||||
"httpversion":"1.1",
|
||||
"response":"200",
|
||||
"bytes":"52878",
|
||||
"referrer":"\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
|
||||
"agent":"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
|
||||
}
|
||||
}]
|
||||
}
|
||||
}
|
||||
|
||||
Try another search for the geographic information derived from the IP address:
|
||||
|
||||
[source,shell]
|
||||
curl -XGET 'localhost:9200/logstash-$DATE/_search?q=geoip.city_name=Buffalo'
|
||||
|
||||
Replace $DATE with the current date, in YYYY.MM.DD format.
|
||||
|
||||
Only one of the log entries comes from Buffalo, so the query produces a single response:
|
||||
|
||||
[source,shell]
|
||||
{"took":3,
|
||||
"timed_out":false,
|
||||
"_shards":{
|
||||
"total":5,
|
||||
"successful":5,
|
||||
"failed":0},
|
||||
"hits":{"total":1,
|
||||
"max_score":1.03399,
|
||||
"hits":[{"_index":"logstash-2015.07.31",
|
||||
"_type":"logs",
|
||||
"_id":"AU7mK3CVSiMeBsJ0b_EP",
|
||||
"_score":1.03399,
|
||||
"_source":{
|
||||
"message":"108.174.55.234 - - [04/Jan/2015:05:27:45 +0000] \"GET /?flav=rss20 HTTP/1.1\" 200 29941 \"-\" \"-\"",
|
||||
"@version":"1",
|
||||
"@timestamp":"2015-07-31T22:11:22.347Z",
|
||||
"host":"localhost",
|
||||
"path":"/path/to/logstash-tutorial-dataset",
|
||||
"clientip":"108.174.55.234",
|
||||
"ident":"-",
|
||||
"auth":"-",
|
||||
"timestamp":"04/Jan/2015:05:27:45 +0000",
|
||||
"verb":"GET",
|
||||
"request":"/?flav=rss20",
|
||||
"httpversion":"1.1",
|
||||
"response":"200",
|
||||
"bytes":"29941",
|
||||
"referrer":"\"-\"",
|
||||
"agent":"\"-\"",
|
||||
"geoip":{
|
||||
"ip":"108.174.55.234",
|
||||
"country_code2":"US",
|
||||
"country_code3":"USA",
|
||||
"country_name":"United States",
|
||||
"continent_code":"NA",
|
||||
"region_name":"NY",
|
||||
"city_name":"Buffalo",
|
||||
"postal_code":"14221",
|
||||
"latitude":42.9864,
|
||||
"longitude":-78.7279,
|
||||
"dma_code":514,
|
||||
"area_code":716,
|
||||
"timezone":"America/New_York",
|
||||
"real_region_name":"New York",
|
||||
"location":[-78.7279,42.9864]
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
}
|
||||
|
||||
[[multiple-input-output-plugins]]
|
||||
=== Multiple Input and Output Plugins
|
||||
|
||||
The information you need to manage often comes from several disparate sources, and use cases can require multiple
|
||||
destinations for your data. Your Logstash pipeline can use multiple input and output plugins to handle these
|
||||
requirements.
|
||||
|
||||
This example creates a Logstash pipeline that takes input from a Twitter feed and the Logstash Forwarder client, then
|
||||
sends the information to an Elasticsearch cluster as well as writing the information directly to a file.
|
||||
|
||||
[float]
|
||||
[[twitter-configuration]]
|
||||
==== Reading from a Twitter feed
|
||||
|
||||
To add a Twitter feed, you need several pieces of information:
|
||||
|
||||
* A _consumer_ key, which uniquely identifies your Twitter app, which is Logstash in this case.
|
||||
* A _consumer secret_, which serves as the password for your Twitter app.
|
||||
* One or more _keywords_ to search in the incoming feed.
|
||||
* An _oauth token_, which identifies the Twitter account using this app.
|
||||
* An _oauth token secret_, which serves as the password of the Twitter account.
|
||||
|
||||
Visit https://dev.twitter.com/apps to set up a Twitter account and generate your consumer key and secret, as well as
|
||||
your OAuth token and secret.
|
||||
|
||||
Use this information to add the following lines to the `input` section of the `first-pipeline.conf` file:
|
||||
|
||||
[source,json]
|
||||
twitter {
|
||||
consumer_key =>
|
||||
consumer_secret =>
|
||||
keywords =>
|
||||
oauth_token =>
|
||||
oauth_token_secret =>
|
||||
}
|
||||
|
||||
[float]
|
||||
[[configuring-lsf]]
|
||||
==== The Logstash Forwarder
|
||||
|
||||
The https://github.com/elastic/logstash-forwarder[Logstash Forwarder] is a lightweight, resource-friendly tool that
|
||||
collects logs from files on the server and forwards these logs to your Logstash instance for processing. The
|
||||
Logstash Forwarder uses a secure protocol called _lumberjack_ to communicate with your Logstash instance. The
|
||||
lumberjack protocol is designed for reliability and low latency. The Logstash Forwarder uses the computing resources of
|
||||
the machine hosting the source data, and the Lumberjack input plugin minimizes the resource demands on the Logstash
|
||||
instance.
|
||||
|
||||
NOTE: In a typical use case, the Logstash Forwarder client runs on a separate machine from the machine running your
|
||||
Logstash instance. For the purposes of this tutorial, both Logstash and the Logstash Forwarder will be running on the
|
||||
same machine.
|
||||
|
||||
Default Logstash configuration includes the {logstash}plugins-inputs-lumberjack.html[Lumberjack input plugin], which is
|
||||
designed to be resource-friendly. To install the Logstash Forwarder on your data source machine, install the
|
||||
appropriate package from the main Logstash https://www.elastic.co/downloads/logstash[product page].
|
||||
|
||||
Create a configuration file for the Logstash Forwarder similar to the following example:
|
||||
|
||||
[source,json]
|
||||
{
|
||||
"network": {
|
||||
"servers": [ "localhost:5043" ],
|
||||
|
||||
"ssl ca": "/path/to/localhost.crt", <1>
|
||||
|
||||
"timeout": 15
|
||||
},
|
||||
"files": [
|
||||
{
|
||||
"paths": [
|
||||
"/path/to/sample-log" <2>
|
||||
],
|
||||
"fields": { "type": "apache" }
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
<1> Path to the SSL certificate for the Logstash instance.
|
||||
<2> Path to the file or files that the Logstash Forwarder processes.
|
||||
|
||||
Save this configuration file as `logstash-forwarder.conf`.
|
||||
|
||||
Configure your Logstash instance to use the Lumberjack input plugin by adding the following lines to the `input` section
|
||||
of the `first-pipeline.conf` file:
|
||||
|
||||
[source,json]
|
||||
lumberjack {
|
||||
port => "5043"
|
||||
ssl_certificate => "/path/to/ssl-cert" <1>
|
||||
ssl_key => "/path/to/ssl-key" <2>
|
||||
}
|
||||
|
||||
<1> Path to the SSL certificate that the Logstash instance uses to authenticate itself to Logstash Forwarder.
|
||||
<2> Path to the key for the SSL certificate.
|
||||
|
||||
[float]
|
||||
[[logstash-file-output]]
|
||||
==== Writing Logstash Data to a File
|
||||
|
||||
You can configure your Logstash pipeline to write data directly to a file with the
|
||||
{logstash}plugins-outputs-file.html[`file`] output plugin.
|
||||
|
||||
Configure your Logstash instance to use the `file` output plugin by adding the following lines to the `output` section
|
||||
of the `first-pipeline.conf` file:
|
||||
|
||||
[source,json]
|
||||
file {
|
||||
path => /path/to/target/file
|
||||
}
|
||||
|
||||
[float]
|
||||
[[multiple-es-nodes]]
|
||||
==== Writing to multiple Elasticsearch nodes
|
||||
|
||||
Writing to multiple Elasticsearch nodes lightens the resource demands on a given Elasticsearch node, as well as
|
||||
providing redundant points of entry into the cluster when a particular node is unavailable.
|
||||
|
||||
To configure your Logstash instance to write to multiple Elasticsearch nodes, edit the output section of the `first-pipeline.conf` file to read:
|
||||
|
||||
[source,json]
|
||||
output {
|
||||
elasticsearch {
|
||||
protocol => "http"
|
||||
host => ["IP Address 1", "IP Address 2", "IP Address 3"]
|
||||
}
|
||||
}
|
||||
|
||||
Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the `host`
|
||||
parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses.
|
||||
|
||||
[float]
|
||||
[[testing-second-pipeline]]
|
||||
==== Testing the Pipeline
|
||||
|
||||
At this point, your `first-pipeline.conf` file looks like this:
|
||||
|
||||
[source,json]
|
||||
input {
|
||||
twitter {
|
||||
consumer_key =>
|
||||
consumer_secret =>
|
||||
keywords =>
|
||||
oauth_token =>
|
||||
oauth_token_secret =>
|
||||
}
|
||||
lumberjack {
|
||||
port => "5043"
|
||||
ssl_certificate => "/path/to/ssl-cert"
|
||||
ssl_key => "/path/to/ssl-key"
|
||||
}
|
||||
}
|
||||
output {
|
||||
elasticsearch {
|
||||
protocol => "http"
|
||||
host => ["IP Address 1", "IP Address 2", "IP Address 3"]
|
||||
}
|
||||
file {
|
||||
path => /path/to/target/file
|
||||
}
|
||||
}
|
||||
|
||||
Logstash is consuming data from the Twitter feed you configured, receiving data from the Logstash Forwarder, and
|
||||
indexing this information to three nodes in an Elasticsearch cluster as well as writing to a file.
|
||||
|
||||
At the data source machine, run the Logstash Forwarder with the following command:
|
||||
|
||||
[source,shell]
|
||||
logstash-forwarder -config logstash-forwarder.conf
|
||||
|
||||
Logstash Forwarder will attempt to connect on port 5403. Until Logstash starts with an active Lumberjack plugin, there
|
||||
won’t be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.
|
||||
|
||||
To verify your configuration, run the following command:
|
||||
|
||||
[source,shell]
|
||||
bin/logstash -f first-pipeline.conf --configtest
|
||||
|
||||
The `--configtest` option parses your configuration file and reports any errors. When the configuration file passes
|
||||
the configuration test, start Logstash with the following command:
|
||||
|
||||
[source,shell]
|
||||
bin/logstash -f first-pipeline.conf
|
||||
|
||||
Use the `grep` utility to search in the target file to verify that information is present:
|
||||
|
||||
[source,shell]
|
||||
grep Mozilla /path/to/target/file
|
||||
|
||||
Run an Elasticsearch query to find the same information in the Elasticsearch cluster:
|
||||
|
||||
[source,shell]
|
||||
curl -XGET 'localhost:9200/logstash-2015.07.30/_search?q=agent=Mozilla'
|
|
@ -1,218 +1,58 @@
|
|||
[[getting-started-with-logstash]]
|
||||
== Getting Started with Logstash
|
||||
|
||||
This section guides you through the process of installing Logstash and verifying that everything is running properly.
|
||||
Later sections deal with increasingly complex configurations to address selected use cases.
|
||||
|
||||
[float]
|
||||
==== Prerequisite: Java
|
||||
A Java runtime is required to run Logstash. We recommend running the latest
|
||||
version of Java. At a minimum, you need Java 7. You can use the
|
||||
http://www.oracle.com/technetwork/java/javase/downloads/index.html[official Oracle distribution],
|
||||
or an open-source distribution such as http://openjdk.java.net/[OpenJDK].
|
||||
[[installing-logstash]]
|
||||
=== Install Logstash
|
||||
|
||||
You can verify that you have Java installed by running the command
|
||||
`java -version` in your shell. Here's something similar to what you might see:
|
||||
NOTE: Logstash requires Java 7 or later. Use the
|
||||
http://www.oracle.com/technetwork/java/javase/downloads/index.html[official Oracle distribution] or an open-source
|
||||
distribution such as http://openjdk.java.net/[OpenJDK].
|
||||
|
||||
[source,java]
|
||||
----------------------------------
|
||||
> java -version
|
||||
To check your Java version, run the following command:
|
||||
|
||||
[source,shell]
|
||||
java -version
|
||||
|
||||
On systems with Java installed, this command produces output similar to the following:
|
||||
|
||||
[source,shell]
|
||||
java version "1.7.0_45"
|
||||
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
|
||||
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
|
||||
----------------------------------
|
||||
|
||||
Once you have verified the existence of Java on your system, we can move on!
|
||||
|
||||
[float]
|
||||
=== Environment Variables
|
||||
Logstash startup script uses environment variables so you can easily configure your
|
||||
environment if you wish to do so.
|
||||
[[installing-binary]]
|
||||
==== Installing from a downloaded binary
|
||||
|
||||
When you start Logstash using the startup script, we launch Java with pre-configured JVM options.
|
||||
Most times it is better to leave the options as is, but you have the option to pass in
|
||||
extra JVM settings. For example, if you want to monitor Logstash using JMX, you can add these settings
|
||||
using the environment variable `LS_JAVA_OPTS` and start Logstash
|
||||
|
||||
In some cases, you may want to completely override the default JVM options chosen by Logstash and use
|
||||
your own settings. Setting `JAVA_OPTS` before you start Logstash will ignore the defaults in the scripts
|
||||
|
||||
The `LS_HEAP_SIZE` environment variable allows to set the maximum heap memory that will be allocated to Logstash java process. By default it is set to `500m`.
|
||||
Download the https://www.elastic.co/downloads/logstash[Logstash installation file] that matches your host environment.
|
||||
Unpack the file. On supported Linux operating systems, you can <<package-repositories,use a package manager>> to
|
||||
install Logstash.
|
||||
|
||||
[float]
|
||||
=== Up and Running!
|
||||
To get started, download and extract the 'logstash' binary and run
|
||||
it with a very simple configuration.
|
||||
[[first-event]]
|
||||
=== Stashing Your First Event
|
||||
|
||||
First, download the Logstash tar file.
|
||||
To test your Logstash installation, run the most basic Logstash pipeline:
|
||||
|
||||
["source","sh",subs="attributes,callouts"]
|
||||
----------------------------------
|
||||
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-{logstash_version}.tar.gz
|
||||
----------------------------------
|
||||
Then, unpack 'logstash-{logstash_version}.tar.gz' on your local filesystem.
|
||||
|
||||
["source","sh",subs="attributes,callouts"]
|
||||
----------------------------------
|
||||
tar -zxvf logstash-{logstash_version}.tar.gz
|
||||
----------------------------------
|
||||
Now, you can run Logstash with a basic configuration:
|
||||
[source,js]
|
||||
----------------------------------
|
||||
cd logstash-{logstash_version}
|
||||
[source,shell]
|
||||
cd logstash-1.5.2
|
||||
bin/logstash -e 'input { stdin { } } output { stdout {} }'
|
||||
----------------------------------
|
||||
|
||||
This simply takes input from stdin and outputs it to stdout.
|
||||
Type something at the command prompt, and you will see it output by Logstash:
|
||||
[source,js]
|
||||
----------------------------------
|
||||
The `-e` flag enables you to specify a configuration directly from the command line. Specifying configurations at the
|
||||
command line lets you quickly test configurations without having to edit a file between iterations.
|
||||
This pipeline takes input from the standard input, `stdin`, and moves that input to the standard output, `stdout`, in a
|
||||
structured format. Type hello world at the command prompt to see Logstash respond:
|
||||
|
||||
[source,shell]
|
||||
hello world
|
||||
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
|
||||
----------------------------------
|
||||
|
||||
OK, that's interesting... By running Logstash with the input called `stdin` and
|
||||
the output named `stdout`, Logstash echoes whatever you type in a structured
|
||||
format. The `-e` flag enables you to specify a configuration directly from the
|
||||
command line. This is especially useful for quickly testing configurations
|
||||
without having to edit a file between iterations.
|
||||
Logstash adds timestamp and IP address information to the message. Exit Logstash by issuing a *CTRL-D* command in the
|
||||
shell where Logstash is running.
|
||||
|
||||
Let's try a slightly fancier example. First, exit Logstash by issuing a `CTRL-C`
|
||||
command in the shell in which it is running. Then, start Logstash again with the
|
||||
following command:
|
||||
|
||||
[source,ruby]
|
||||
----------------------------------
|
||||
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
|
||||
----------------------------------
|
||||
|
||||
Now, enter some more test input:
|
||||
[source,ruby]
|
||||
----------------------------------
|
||||
goodnight moon
|
||||
{
|
||||
"message" => "goodnight moon",
|
||||
"@timestamp" => "2013-11-20T23:48:05.335Z",
|
||||
"@version" => "1",
|
||||
"host" => "my-laptop"
|
||||
}
|
||||
----------------------------------
|
||||
|
||||
Re-configuring the `stdout` output by adding a "codec" enables you to change
|
||||
what Logstash outputs. By adding inputs, outputs, and filters to your
|
||||
configuration, you can massage the log data and maximize the flexibility of the
|
||||
stored data when you query it.
|
||||
|
||||
[float]
|
||||
=== Storing logs with Elasticsearch
|
||||
Now, you're probably saying, "that's all fine and dandy, but typing all my logs
|
||||
into Logstash isn't really an option, and merely seeing them spit to STDOUT
|
||||
isn't very useful." Good point. First, let's set up Elasticsearch to store the
|
||||
messages we send into Logstash. If you don't have Elasticsearch already
|
||||
installed, you can
|
||||
http://www.elastic.co/download/[download the RPM or DEB package], or install
|
||||
manually by downloading the current release tarball, by issuing the following
|
||||
four commands:
|
||||
|
||||
["source","sh",subs="attributes,callouts"]
|
||||
----------------------------------
|
||||
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-{elasticsearch_version}.tar.gz
|
||||
tar -zxvf elasticsearch-{elasticsearch_version}.tar.gz
|
||||
cd elasticsearch-{elasticsearch_version}/
|
||||
./bin/elasticsearch
|
||||
----------------------------------
|
||||
|
||||
NOTE: This tutorial runs Logstash {logstash_version} with Elasticsearch
|
||||
{elasticsearch_version}, although you can use it with an Elasticsearch cluster running 1.0.0 or
|
||||
later.
|
||||
|
||||
You can get started with Logstash using the default Elasticsearch installation
|
||||
and configuration. See the
|
||||
http://www.elastic.co/guide/en/elasticsearch/reference/current/index.html[Elasticsearch Reference]
|
||||
for more information about installing and running Elasticsearch.
|
||||
|
||||
Now that you have Elasticsearch running on port 9200 (you do, right?), you can
|
||||
easily configure Logstash to use Elasticsearch as its backend. The defaults for
|
||||
both Logstash and Elasticsearch are fairly sane and well thought out, so you can
|
||||
omit the optional configurations within the elasticsearch output:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
|
||||
----------------------------------
|
||||
|
||||
Type something and Logstash processes it as before. However, this time you won't
|
||||
see any output, since the stdout output isn't configured.
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
you know, for logs
|
||||
----------------------------------
|
||||
|
||||
You can confirm that Elasticsearch actually received the data by submitting a
|
||||
curl request:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
curl 'http://localhost:9200/_search?pretty'
|
||||
----------------------------------
|
||||
|
||||
This should return something like the following:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"took" : 2,
|
||||
"timed_out" : false,
|
||||
"_shards" : {
|
||||
"total" : 5,
|
||||
"successful" : 5,
|
||||
"failed" : 0
|
||||
},
|
||||
"hits" : {
|
||||
"total" : 1,
|
||||
"max_score" : 1.0,
|
||||
"hits" : [ {
|
||||
"_index" : "logstash-2013.11.21",
|
||||
"_type" : "logs",
|
||||
"_id" : "2ijaoKqARGHvbMgP3BspJB",
|
||||
"_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"}
|
||||
} ]
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
|
||||
Congratulations! You've successfully stashed logs in Elasticsearch via Logstash.
|
||||
|
||||
[float]
|
||||
==== Multiple Outputs
|
||||
|
||||
As a quick exercise in configuring multiple Logstash outputs, let's invoke
|
||||
Logstash again, using both 'stdout' and 'elasticsearch' as outputs:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
|
||||
----------------------------------
|
||||
Now when you enter a phrase, it is echoed to the terminal and saved in
|
||||
Elasticsearch! (You can verify this using curl or elasticsearch-kopf).
|
||||
|
||||
[float]
|
||||
==== Default - Daily Indices
|
||||
You might have noticed that Logstash is smart enough to create a new index in
|
||||
Elasticsearch. The default index name is in the form of `logstash-YYYY.MM.DD`,
|
||||
which essentially creates one index per day. At midnight (UTC), Logstash
|
||||
automagically rotates the index to a fresh one, with the new current day's
|
||||
timestamp. This allows you to keep windows of data, based on how far
|
||||
retroactively you'd like to query your log data. Of course, you can always
|
||||
archive (or re-index) your data to an alternate location so you can query
|
||||
further into the past. If you want to delete old indices after a certain time
|
||||
period, you can use the
|
||||
http://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html[Elasticsearch Curator tool].
|
||||
|
||||
[float]
|
||||
=== Moving On
|
||||
Configuring inputs and outputs from the command line is convenient for getting
|
||||
started and doing quick testing. To move beyond these simple examples, however,
|
||||
you need to know a bit more about the Logstash event processing pipeline and how
|
||||
to specify pipeline options in a config file. To learn about the event
|
||||
processing pipeline, see <<pipeline,Logstash Processing Pipeline>>. To see how
|
||||
to configure more complex pipelines using config files, see
|
||||
<<configuration, Configuring Logstash>>.
|
||||
The <<advanced-pipeline,Advanced Tutorial>> expands the capabilities of your Logstash instance to cover broader
|
||||
use cases.
|
||||
|
|
BIN
docs/asciidoc/static/images/basic_logstash_pipeline.png
Normal file
BIN
docs/asciidoc/static/images/basic_logstash_pipeline.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 77 KiB |
Loading…
Add table
Add a link
Reference in a new issue