Merge pull request #1095 from jamtur01/docfixes

Fixed some issues in the tutorial
2025-04-24 14:47:19 -04:00 · 2014-02-19 13:14:03 -06:00 · 2014-02-19 13:14:03 -06:00 · 6e2531e8e8
commit 6e2531e8e8
parent 199f8d36ed 99f6dee78f
1 changed files with 36 additions and 36 deletions
--- a/docs/tutorials/getting-started-with-logstash.asciidoc
+++ b/docs/tutorials/getting-started-with-logstash.asciidoc
@ -6,15 +6,15 @@ Logstash is a tool for receiving, processing and outputting logs. All kinds of l
 Using Elasticsearch as a backend datastore, and kibana as a frontend reporting tool, Logstash acts as the workhorse, creating a powerful pipeline for storing, querying and analyzing your logs. With an arsenal of built-in inputs, filters, codecs and outputs, you can harness some powerful functionality with a small amount of effort. So, let's get started!

 === Prerequisite: Java
-The only prerequisite required by logstash is a Java runtime. You can check that you have it installed by running the  command `java -version` in your shell. Here's something similar to what you might see:
+The only prerequisite required by Logstash is a Java runtime. You can check that you have it installed by running the  command `java -version` in your shell. Here's something similar to what you might see:
 ----
 > java -version
 java version "1.7.0_45"
 Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
 Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
 ----
-It is recommended to run a recent version of Java in order to ensure the greatest success in running logstash.
- 
+It is recommended to run a recent version of Java in order to ensure the greatest success in running Logstash.
+
 It's fine to run an open-source version such as OpenJDK: +
 http://openjdk.java.net/

@ -26,7 +26,7 @@ Once you have verified the existence of Java on your system, we can move on!
 == Up and Running!

 === Logstash in two commands
-First, we're going to download the pre-built logstash binary and run it with a very simple configuration.
+First, we're going to download the 'logstash' binary and run it with a very simple configuration.
 ----
 curl -O https://download.elasticsearch.org/logstash/logstash/logstash-%VERSION%.tar.gz
 ----
@ -34,21 +34,21 @@ Now you should have the file named 'logstash-%VERSION%.tar.gz' on your local fil
 ----
 tar zxvf logstash-%VERSION%.tar.gz
 cd logstash-%VERSION%
---
+----
 Now let's run it:
---
+----
 bin/logstash -e 'input { stdin { } } output { stdout {} }'
 ----

-Now type something into your command prompt, and you will see it output by logstash: 
+Now type something into your command prompt, and you will see it output by Logstash:
 ----
 hello world
 2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
 ----

-OK, that's interesting... We ran logstash with an input called "stdin", and an output named "stdout", and logstash basically echoed back whatever we typed in some sort of structured format. Note that specifying -e allows logstash to accept a configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file between iterations.
+OK, that's interesting... We ran Logstash with an input called "stdin", and an output named "stdout", and Logstash basically echoed back whatever we typed in some sort of structured format. Note that specifying the *-e* command line flag allows Logstash to accept a configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file between iterations.

-Let's try a slightly fancier example. First, you should exit logstash by issuing a 'CTRL-C' command in the shell in which it is running. Now run logstash again with the following command:
+Let's try a slightly fancier example. First, you should exit Logstash by issuing a 'CTRL-C' command in the shell in which it is running. Now run Logstash again with the following command:
 ----
 bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
 ----
@ -64,10 +64,10 @@ goodnight moon
 }
 ----

-So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of logstash. By adding inputs, outputs and filters to your configuration, it's possible to massage the log data in many ways, in order to maximize flexibility of the stored data when you are querying it.
+So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of Logstash. By adding inputs, outputs and filters to your configuration, it's possible to massage the log data in many ways, in order to maximize flexibility of the stored data when you are querying it.

 == Storing logs with Elasticsearch 
-Now, you're probably saying, "that's all fine and dandy, but typing all my logs into logstash isn't really an option, and merely seeing them spit to STDOUT isn't very useful." Good point. First, let's set up Elasticsearch to store the messages we send into logstash. If you don't have Elasticearch already installed, you can http://www.elasticsearch.org/download/[download the RPM or DEB package], or install manually by downloading the current release tarball, by issuing the following four commands:
+Now, you're probably saying, "that's all fine and dandy, but typing all my logs into Logstash isn't really an option, and merely seeing them spit to STDOUT isn't very useful." Good point. First, let's set up Elasticsearch to store the messages we send into Logstash. If you don't have Elasticearch already installed, you can http://www.elasticsearch.org/download/[download the RPM or DEB package], or install manually by downloading the current release tarball, by issuing the following four commands:
 ----
 curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
 tar zxvf elasticsearch-%ELASTICSEARCH_VERSION%.tar.gz
@ -79,12 +79,12 @@ NOTE: This tutorial specifies running Logstash %VERSION% with Elasticsearch %ELA

 More detailed information on installing and configuring Elasticsearch can be found on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html[The Elasticsearch reference pages]. However, for the purposes of Getting Started with Logstash, the default installation and configuration of Elasticsearch should be sufficient.

-Now that we have Elasticsearch running on port 9200 (we do, right?), logstash can be simply configured to use Elasticsearch as its backend. The defaults for both logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional configurations within the elasticsearch output:
+Now that we have Elasticsearch running on port 9200 (we do, right?), Logstash can be simply configured to use Elasticsearch as its backend. The defaults for both Logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional configurations within the elasticsearch output:
 ----
 bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
 ----

-Type something, and logstash will process it as before (this time you won't see any output, since we don't have the stdout output configured)
+Type something, and Logstash will process it as before (this time you won't see any output, since we don't have the stdout output configured)
 ----
 you know, for logs
 ----
@ -117,51 +117,51 @@ which should return something like this:
 }
 ----

-Congratulations! You've successfully stashed logs in Elasticsearch via logstash.
+Congratulations! You've successfully stashed logs in Elasticsearch via Logstash.

 === Elasticsearch Plugins (an aside)
-Another very useful tool for querying your logstash data (and Elasticsearch in general) is the Elasticsearch-kopf plugin. Here is more information on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html[Elasticsearch plugins]. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch directory (the same one in which you ran Elasticsearch earlier):
+Another very useful tool for querying your Logstash data (and Elasticsearch in general) is the Elasticearch-kopf plugin. Here is more information on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html[Elasticsearch plugins]. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch directory (the same one in which you ran Elasticsearch earlier):
 ----
 bin/plugin -install lmenezes/elasticsearch-kopf
 ----
 Now you can browse to http://localhost:9200/_plugin/kopf[http://localhost:9200/_plugin/kopf] to browse your Elasticsearch data, settings and mappings!

 === Multiple Outputs
-As a quick exercise in configuring multiple Logstash outputs, let's invoke logstash again, using both the 'stdout' as well as the 'elasticsearch' output:
+As a quick exercise in configuring multiple Logstash outputs, let's invoke Logstash again, using both the 'stdout' as well as the 'elasticsearch' output:
 ----
 bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
 ----
 Typing a phrase will now echo back to your terminal, as well as save in Elasticsearch! (Feel free to verify this using curl or elasticsearch-kopf).

 === Default - Daily Indices
-You might notice that logstash was smart enough to create a new index in Elasticsearch... The default index name is in the form of 'logstash-YYYY.MM.DD', which essentially creates one index per day. At midnight (GMT?), logstash will automagically rotate the index to a fresh new one, with the new current day's timestamp. This allows you to keep windows of data, based on how far retroactively you'd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able to query further into the past. If you'd like to simply delete old indices after a certain time period, you can use the https://github.com/elasticsearch/curator[Elasticsearch Curator tool].
+You might notice that Logstash was smart enough to create a new index in Elasticsearch... The default index name is in the form of 'logstash-YYYY.MM.DD', which essentially creates one index per day. At midnight (GMT?), Logstash will automagically rotate the index to a fresh new one, with the new current day's timestamp. This allows you to keep windows of data, based on how far retroactively you'd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able to query further into the past. If you'd like to simply delete old indices after a certain time period, you can use the https://github.com/elasticsearch/curator[Elasticsearch Curator tool].

 == Moving On
-Now you're ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features of logstash, and how they interact with the logstash engine.
+Now you're ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features of Logstash, and how they interact with the Logstash engine.

 === The Life of an Event

-Inputs, Outputs, Codecs and Filters are at the heart of the logstash configuration. By creating a pipeline of event processing, logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your data. To get you thinking about the various options available in Logstash, let's discuss some of the more common configurations currently in use. For more details, read about http://logstash.net/docs/1.2.2/life-of-an-event[the Logstash event pipeline].
+Inputs, Outputs, Codecs and Filters are at the heart of the Logstash configuration. By creating a pipeline of event processing, Logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your data. To get you thinking about the various options available in Logstash, let's discuss some of the more common configurations currently in use. For more details, read about http://logstash.net/docs/latest/life-of-an-event[the Logstash event pipeline].

 ==== Inputs
-Inputs are the mechanism for passing log data to logstash. Some of the more useful, commonly-used ones are:
+Inputs are the mechanism for passing log data to Logstash. Some of the more useful, commonly-used ones are:

 * *file*: reads from a file on the filesystem, much like the UNIX command "tail -0a"
 * *syslog*: listens on the well-known port 514 for syslog messages and parses according to RFC3164 format
-* *redis*: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized logstash installation, which queues logstash events from remote logstash "shippers". 
+* *redis*: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized Logstash installation, which queues Logstash events from remote Logstash "shippers". 
 * *lumberjack*: processes events sent in the lumberjack protocol. Now called https://github.com/elasticsearch/logstash-forwarder[logstash-forwarder].

 ==== Filters
 Filters are used as intermediary processing devices in the Logstash chain. They are often combined with conditionals in order to perform a certain action on an event, if it matches particular criteria. Some useful filters:

-* *grok*: parses arbitrary text and structure it. Grok is currently the best way in logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to logstash, it's more than likely you'll find one that meets your needs!
+* *grok*: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to Logstash, it's more than likely you'll find one that meets your needs!
 * *mutate*: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your events.
 * *drop*: drop an event completely, for example, 'debug' events.
 * *clone*: make a copy of an event, possibly adding or removing fields.
 * *geoip*: adds information about geographical location of IP addresses (and displays amazing charts in kibana)

 ==== Outputs
-Outputs are the final phase of the logstash pipeline. An event may pass through multiple outputs during processing, but once all outputs are complete, the event has finished its execution. Some commonly used outputs include:
+Outputs are the final phase of the Logstash pipeline. An event may pass through multiple outputs during processing, but once all outputs are complete, the event has finished its execution. Some commonly used outputs include:

 * *elasticsearch*: If you're planning to save your data in an efficient, convenient and easily queryable format... Elasticsearch is the way to go. Period. Yes, we're biased :)
 * *file*: writes event data to a file on disk.
@ -174,13 +174,13 @@ Codecs are basically stream filters which can operate as part of an input, or an
 * *json*: encode / decode data in JSON format
 * *multiline*: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages

-For the complete list of (current) configurations, visit the logstash "plugin configuration" section of the http://logstash.net/docs/1.2.2/[logstash documentation page].
+For the complete list of (current) configurations, visit the Logstash "plugin configuration" section of the http://logstash.net/docs/latest/[Logstash documentation page].


 == More fun with Logstash
 === Persistent Configuration files

-Specifying configurations on the command line using '-e' is only so helpful, and more advanced setups will require more lengthy, long-lived configurations. First, let's create a simple configuration file, and invoke logstash using it. Create a file named "logstash-simple.conf" and save it in the same directory as logstash.
+Specifying configurations on the command line using '-e' is only so helpful, and more advanced setups will require more lengthy, long-lived configurations. First, let's create a simple configuration file, and invoke Logstash using it. Create a file named "logstash-simple.conf" and save it in the same directory as Logstash.

 ----
 input { stdin { } }
@ -218,7 +218,7 @@ output {
  stdout { codec => rubydebug }
 }
 ----
-Run logstash with this configuration:
+Run Logstash with this configuration:

 ----
 bin/logstash -f logstash-filter.conf
@ -248,9 +248,9 @@ You should see something returned to STDOUT which looks like this:
          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
 }
 ----
-As you can see, logstash (with help from the grok filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying and analyzing our log data... for example, we'll be able to run reports on HTTP response codes, IP addresses, referrers, etc. very easily. There are quite a few grok patterns included with logstash out-of-the-box, so it's quite likely if you're attempting to parse a fairly common log format, someone has already done the work for you. For more details, see the list of https://github.com/logstash/logstash/blob/master/patterns/grok-patterns[logstash grok patterns] on github.
+As you can see, Logstash (with help from the *grok* filter) was able to parse the log line (which happens to be in Apache "combined log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying and analyzing our log data... for example, we'll be able to run reports on HTTP response codes, IP addresses, referrers, etc. very easily. There are quite a few grok patterns included with Logstash out-of-the-box, so it's quite likely if you're attempting to parse a fairly common log format, someone has already done the work for you. For more details, see the list of https://github.com/logstash/logstash/blob/master/patterns/grok-patterns[logstash grok patterns] on github.

-The other filter used in this example is the *date* filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the @timestamp field in this example is set to December 11, 2013, even though logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example... the ability to tell logstash "use this value as the timestamp for this event".
+The other filter used in this example is the *date* filter. This filter parses out a timestamp and uses it as the timestamp for the event (regardless of when you're ingesting the log data). You'll notice that the @timestamp field in this example is set to December 11, 2013, even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example... the ability to tell Logstash "use this value as the timestamp for this event".

 == Useful Examples

@ -297,9 +297,9 @@ Now run it with the -f flag as in the last example:
 ----
 bin/logstash -f logstash-apache.conf
 ----
-You should be able to see your apache log data in Elasticsearch now! You'll notice that logstash opened the file you configured, and read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).
+You should be able to see your apache log data in Elasticsearch now! You'll notice that Logstash opened the file you configured, and read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by Logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access" (this is done by the type => "apache_access" line in the input configuration).

-In this configuration, logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching '*log'), by changing one line in the above configuration, like this:
+In this configuration, Logstash is only watching the apache access_log, but it's easy enough to watch both the access_log and the error_log (actually, any file matching '*log'), by changing one line in the above configuration, like this:

 ----
 input {
@ -307,12 +307,12 @@ input {
    path => "/Applications/XAMPP/logs/*_log"
 ...
 ----
-Now, rerun logstash, and you will see both the error and access logs processed via logstash. However, if you inspect your data (using elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. That's because we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can... 
+Now, rerun Logstash, and you will see both the error and access logs processed via Logstash. However, if you inspect your data (using elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. That's because we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields. Wouldn't it be nice *if* we could control how a line was parsed, based on its format? Well, we can... 

-Also, you might have noticed that logstash did not reprocess the events which were already seen in the access_log file. Logstash is able to save its position in files, only processing new lines as they are added to the file. Neat!
+Also, you might have noticed that Logstash did not reprocess the events which were already seen in the access_log file. Logstash is able to save its position in files, only processing new lines as they are added to the file. Neat!

 === Conditionals
-Now we can build on the previous example, where we introduced the concept of a *conditional*. A conditional should be familiar to most logstash users, in the general sense. You may use 'if', 'else if' and 'else' statements, as in many other programming languages. Let's label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").
+Now we can build on the previous example, where we introduced the concept of a *conditional*. A conditional should be familiar to most Logstash users, in the general sense. You may use 'if', 'else if' and 'else' statements, as in many other programming languages. Let's label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").

 ----
 input {
@ -348,7 +348,7 @@ You'll notice we've labeled all events using the "type" field, but we didn't act
 === Syslog
 OK, now we can move on to another incredibly useful example: *syslog*. Syslog is one of the most common use cases for Logstash, and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164 :). Syslog is the de facto UNIX networked logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example, you won't need a functioning syslog instance; we'll fake it from the command line, so you can get a feel for what happens.

-First, let's make a simple configuration file for logstash + syslog, called 'logstash-syslog.conf'. 
+First, let's make a simple configuration file for Logstash + syslog, called 'logstash-syslog.conf'. 

 ----
 input {
@ -385,7 +385,7 @@ Run it as normal:
 ----
 bin/logstash -f logstash-syslog.conf
 ----
-Normally, a client machine would connect to the logstash instance on port 5000 and send its message. In this simplified case, we're simply going to telnet to logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell window to interact with the logstash syslog input and type the following command:
+Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. In this simplified case, we're simply going to telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell window to interact with the Logstash syslog input and type the following command:

 ----
 telnet localhost 5000
@ -406,7 +406,7 @@ Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/polle
 Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, ty
 pe 'lightweight'.
 ----
-Now you should see the output of logstash in your original shell as it processes and parses messages!
+Now you should see the output of Logstash in your original shell as it processes and parses messages!

 ----
 {