logstash/docs/static/advanced-pipeline.asciidoc

862 lines
35 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[advanced-pipeline]]
=== Parsing Logs with Logstash
In <<first-event>>, you created a basic Logstash pipeline to test your Logstash setup. In the real world, a Logstash
pipeline is a bit more complex: it typically has one or more input, filter, and output plugins.
In this section, you create a Logstash pipeline that uses Filebeat to take Apache web logs as input, parses those
logs to create specific, named fields from the logs, and writes the parsed data to an Elasticsearch cluster. Rather than
defining the pipeline configuration at the command line, you'll define the pipeline in a config file.
To get started, go https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz[here] to
download the sample data set used in this example. Unpack the file.
[[configuring-filebeat]]
==== Configuring Filebeat to Send Log Lines to Logstash
Before you create the Logstash pipeline, you'll configure Filebeat to send log lines to Logstash.
The https://github.com/elastic/beats/tree/master/filebeat[Filebeat] client is a lightweight, resource-friendly tool
that collects logs from files on the server and forwards these logs to your Logstash instance for processing.
Filebeat is designed for reliability and low latency. Filebeat has a light resource footprint on the host machine,
and the {logstash-ref}/plugins-inputs-beats.html[`Beats input`] plugin minimizes the resource demands on the Logstash
instance.
NOTE: In a typical use case, Filebeat runs on a separate machine from the machine running your
Logstash instance. For the purposes of this tutorial, Logstash and Filebeat are running on the
same machine.
The default Logstash installation includes the {logstash-ref}/plugins-inputs-beats.html[`Beats input`] plugin. The Beats
input plugin enables Logstash to receive events from the Elastic Beats framework, which means that any Beat written
to work with the Beats framework, such as Packetbeat and Metricbeat, can also send event data to Logstash.
To install Filebeat on your data source machine, download the appropriate package from the Filebeat https://www.elastic.co/downloads/beats/filebeat[product page]. You can also refer to
{filebeat-ref}/filebeat-getting-started.html[Getting Started with Filebeat] in the Beats documentation for additional
installation instructions.
After installing Filebeat, you need to configure it. Open the `filebeat.yml` file located in your Filebeat installation
directory, and replace the contents with the following lines. Make sure `paths` points to the example Apache log file,
`logstash-tutorial.log`, that you downloaded earlier:
[source,yaml]
--------------------------------------------------------------------------------
filebeat.prospectors:
- type: log
paths:
- /path/to/file/logstash-tutorial.log <1>
output.logstash:
hosts: ["localhost:5044"]
--------------------------------------------------------------------------------
<1> Absolute path to the file or files that Filebeat processes.
Save your changes.
To keep the configuration simple, you won't specify TLS/SSL settings as you would in a real world
scenario.
At the data source machine, run Filebeat with the following command:
[source,shell]
--------------------------------------------------------------------------------
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
NOTE: If you run Filebeat as root, you need to change ownership of the configuration file (see
{beats-ref}/config-file-permissions.html[Config File Ownership and Permissions]
in the _Beats Platform Reference_).
Filebeat will attempt to connect on port 5044. Until Logstash starts with an active Beats plugin, there
wont be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.
==== Configuring Logstash for Filebeat Input
Next, you create a Logstash configuration pipeline that uses the Beats input plugin to receive
events from Beats.
The following text represents the skeleton of a configuration pipeline:
[source,json]
--------------------------------------------------------------------------------
# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
}
--------------------------------------------------------------------------------
This skeleton is non-functional, because the input and output sections dont have any valid options defined.
To get started, copy and paste the skeleton configuration pipeline into a file named `first-pipeline.conf` in your home
Logstash directory.
Next, configure your Logstash instance to use the Beats input plugin by adding the following lines to the `input` section
of the `first-pipeline.conf` file:
[source,json]
--------------------------------------------------------------------------------
beats {
port => "5044"
}
--------------------------------------------------------------------------------
You'll configure Logstash to write to Elasticsearch later. For now, you can add the following line
to the `output` section so that the output is printed to stdout when you run Logstash:
[source,json]
--------------------------------------------------------------------------------
stdout { codec => rubydebug }
--------------------------------------------------------------------------------
When you're done, the contents of `first-pipeline.conf` should look like this:
[source,json]
--------------------------------------------------------------------------------
input {
beats {
port => "5044"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
stdout { codec => rubydebug }
}
--------------------------------------------------------------------------------
To verify your configuration, run the following command:
[source,shell]
--------------------------------------------------------------------------------
bin/logstash -f first-pipeline.conf --config.test_and_exit
--------------------------------------------------------------------------------
The `--config.test_and_exit` option parses your configuration file and reports any errors.
If the configuration file passes the configuration test, start Logstash with the following command:
[source,shell]
--------------------------------------------------------------------------------
bin/logstash -f first-pipeline.conf --config.reload.automatic
--------------------------------------------------------------------------------
The `--config.reload.automatic` option enables automatic config reloading so that you don't have to stop and restart Logstash
every time you modify the configuration file.
As Logstash starts up, you might see one or more warning messages about Logstash ignoring the `pipelines.yml` file. You
can safely ignore this warning. The `pipelines.yml` file is used for running <<multiple-pipelines,multiple pipelines>>
in a single Logstash instance. For the examples shown here, you are running a single pipeline.
If your pipeline is working correctly, you should see a series of events like the following written to the console:
[source,json]
--------------------------------------------------------------------------------
{
"@timestamp" => 2017-11-09T01:44:20.071Z,
"offset" => 325,
"@version" => "1",
"beat" => {
"name" => "My-MacBook-Pro.local",
"hostname" => "My-MacBook-Pro.local",
"version" => "6.0.0"
},
"host" => "My-MacBook-Pro.local",
"prospector" => {
"type" => "log"
},
"source" => "/path/to/file/logstash-tutorial.log",
"message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"tags" => [
[0] "beats_input_codec_plain_applied"
]
}
...
--------------------------------------------------------------------------------
[float]
[[configuring-grok-filter]]
==== Parsing Web Logs with the Grok Filter Plugin
Now you have a working pipeline that reads log lines from Filebeat. However you'll notice that the format of the log messages
is not ideal. You want to parse the log messages to create specific, named fields from the logs.
To do this, you'll use the `grok` filter plugin.
The {logstash-ref}/plugins-filters-grok.html[`grok`] filter plugin is one of several plugins that are available by default in
Logstash. For details on how to manage Logstash plugins, see the <<working-with-plugins,reference documentation>> for
the plugin manager.
The `grok` filter plugin enables you to parse the unstructured log data into something structured and queryable.
Because the `grok` filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to
make decisions about how to identify the patterns that are of interest to your use case. A representative line from the
web server log sample looks like this:
[source,shell]
--------------------------------------------------------------------------------
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png
HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel
Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
--------------------------------------------------------------------------------
The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. To parse the data, you can use the `%{COMBINEDAPACHELOG}` grok pattern, which structures lines from the Apache log using the following schema:
[horizontal]
*Information*:: *Field Name*
IP Address:: `clientip`
User ID:: `ident`
User Authentication:: `auth`
timestamp:: `timestamp`
HTTP Verb:: `verb`
Request body:: `request`
HTTP Version:: `httpversion`
HTTP Status Code:: `response`
Bytes served:: `bytes`
Referrer URL:: `referrer`
User agent:: `agent`
TIP: If you need help building grok patterns, try out the
{kibana-ref}/xpack-grokdebugger.html[Grok Debugger]. The Grok Debugger is an
{xpack} feature under the Basic License and is therefore *free to use*.
Edit the `first-pipeline.conf` file and replace the entire `filter` section with the following text:
[source,json]
--------------------------------------------------------------------------------
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
--------------------------------------------------------------------------------
When you're done, the contents of `first-pipeline.conf` should look like this:
[source,json]
--------------------------------------------------------------------------------
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
output {
stdout { codec => rubydebug }
}
--------------------------------------------------------------------------------
Save your changes. Because you've enabled automatic config reloading, you don't have to restart Logstash to
pick up your changes. However, you do need to force Filebeat to read the log file from scratch. To do this,
go to the terminal window where Filebeat is running and press Ctrl+C to shut down Filebeat. Then delete the
Filebeat registry file. For example, run:
[source,shell]
--------------------------------------------------------------------------------
sudo rm data/registry
--------------------------------------------------------------------------------
Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces
Filebeat to read all the files it's harvesting from scratch.
Next, restart Filebeat with the following command:
[source,shell]
--------------------------------------------------------------------------------
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
There might be a slight delay before Filebeat begins processing events if it needs to wait for Logstash to reload the
config file.
After Logstash applies the grok pattern, the events will have the following JSON representation:
[source,json]
--------------------------------------------------------------------------------
{
"request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"offset" => 325,
"auth" => "-",
"ident" => "-",
"verb" => "GET",
"prospector" => {
"type" => "log"
},
"source" => "/path/to/file/logstash-tutorial.log",
"message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
"@timestamp" => 2017-11-09T02:51:12.416Z,
"response" => "200",
"bytes" => "203023",
"clientip" => "83.149.9.216",
"@version" => "1",
"beat" => {
"name" => "My-MacBook-Pro.local",
"hostname" => "My-MacBook-Pro.local",
"version" => "6.0.0"
},
"host" => "My-MacBook-Pro.local",
"httpversion" => "1.1",
"timestamp" => "04/Jan/2015:05:13:42 +0000"
}
--------------------------------------------------------------------------------
Notice that the event includes the original message, but the log message is also broken down into specific fields.
[float]
[[configuring-geoip-plugin]]
==== Enhancing Your Data with the Geoip Filter Plugin
In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing
data. As an example, the {logstash-ref}/plugins-filters-geoip.html[`geoip`] plugin looks up IP addresses, derives geographic
location information from the addresses, and adds that location information to the logs.
Configure your Logstash instance to use the `geoip` filter plugin by adding the following lines to the `filter` section
of the `first-pipeline.conf` file:
[source,json]
--------------------------------------------------------------------------------
geoip {
source => "clientip"
}
--------------------------------------------------------------------------------
The `geoip` plugin configuration requires you to specify the name of the source field that contains the IP address to look up. In this example, the `clientip` field contains the IP address.
Since filters are evaluated in sequence, make sure that the `geoip` section is after the `grok` section of
the configuration file and that both the `grok` and `geoip` sections are nested within the `filter` section.
When you're done, the contents of `first-pipeline.conf` should look like this:
[source,json]
--------------------------------------------------------------------------------
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
stdout { codec => rubydebug }
}
--------------------------------------------------------------------------------
Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C),
delete the registry file, and then restart Filebeat with the following command:
[source,shell]
--------------------------------------------------------------------------------
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
Notice that the event now contains geographic location information:
[source,json]
--------------------------------------------------------------------------------
{
"request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"geoip" => {
"timezone" => "Europe/Moscow",
"ip" => "83.149.9.216",
"latitude" => 55.7485,
"continent_code" => "EU",
"city_name" => "Moscow",
"country_name" => "Russia",
"country_code2" => "RU",
"country_code3" => "RU",
"region_name" => "Moscow",
"location" => {
"lon" => 37.6184,
"lat" => 55.7485
},
"postal_code" => "101194",
"region_code" => "MOW",
"longitude" => 37.6184
},
...
--------------------------------------------------------------------------------
[float]
[[indexing-parsed-data-into-elasticsearch]]
==== Indexing Your Data into Elasticsearch
Now that the web logs are broken down into specific fields, you're ready to get
your data into Elasticsearch.
TIP: You can run Elasticsearch on your own hardware, or use our
https://www.elastic.co/cloud/elasticsearch-service[hosted {es} Service] on
Elastic Cloud. The Elasticsearch Service is available on both AWS and GCP.
https://www.elastic.co/cloud/elasticsearch-service/signup[Try the {es} Service
for free].
The Logstash pipeline can index the data into an
Elasticsearch cluster. Edit the `first-pipeline.conf` file and replace the entire `output` section with the following
text:
[source,json]
--------------------------------------------------------------------------------
output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}
--------------------------------------------------------------------------------
With this configuration, Logstash uses http protocol to connect to Elasticsearch. The above example assumes that
Logstash and Elasticsearch are running on the same instance. You can specify a remote Elasticsearch instance by using
the `hosts` configuration to specify something like `hosts => [ "es-machine:9092" ]`.
At this point, your `first-pipeline.conf` file has input, filter, and output sections properly configured, and looks
something like this:
[source,json]
--------------------------------------------------------------------------------
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}
--------------------------------------------------------------------------------
Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C),
delete the registry file, and then restart Filebeat with the following command:
[source,shell]
--------------------------------------------------------------------------------
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
[float]
[[testing-initial-pipeline]]
===== Testing Your Pipeline
Now that the Logstash pipeline is configured to index the data into an
Elasticsearch cluster, you can query Elasticsearch.
Try a test query to Elasticsearch based on the fields created by the `grok` filter plugin.
Replace $DATE with the current date, in YYYY.MM.DD format:
[source,shell]
--------------------------------------------------------------------------------
curl -XGET 'localhost:9200/logstash-$DATE/_search?pretty&q=response=200'
--------------------------------------------------------------------------------
NOTE: The date used in the index name is based on UTC, not the timezone where Logstash is running.
If the query returns `index_not_found_exception`, make sure that `logstash-$DATE` reflects the actual
name of the index. To see a list of available indexes, use this query: `curl 'localhost:9200/_cat/indices?v'`.
You should get multiple hits back. For example:
[source,json]
--------------------------------------------------------------------------------
{
"took": 50,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 98,
"max_score": 2.793642,
"hits": [
{
"_index": "logstash-2017.11.09",
"_type": "doc",
"_id": "3IzDnl8BW52sR0fx5wdV",
"_score": 2.793642,
"_source": {
"request": "/presentations/logstash-monitorama-2013/images/frontend-response-codes.png",
"agent": """"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"""",
"geoip": {
"timezone": "Europe/Moscow",
"ip": "83.149.9.216",
"latitude": 55.7485,
"continent_code": "EU",
"city_name": "Moscow",
"country_name": "Russia",
"country_code2": "RU",
"country_code3": "RU",
"region_name": "Moscow",
"location": {
"lon": 37.6184,
"lat": 55.7485
},
"postal_code": "101194",
"region_code": "MOW",
"longitude": 37.6184
},
"offset": 2932,
"auth": "-",
"ident": "-",
"verb": "GET",
"prospector": {
"type": "log"
},
"source": "/path/to/file/logstash-tutorial.log",
"message": """83.149.9.216 - - [04/Jan/2015:05:13:45 +0000] "GET /presentations/logstash-monitorama-2013/images/frontend-response-codes.png HTTP/1.1" 200 52878 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"""",
"tags": [
"beats_input_codec_plain_applied"
],
"referrer": """"http://semicomplete.com/presentations/logstash-monitorama-2013/"""",
"@timestamp": "2017-11-09T03:11:35.304Z",
"response": "200",
"bytes": "52878",
"clientip": "83.149.9.216",
"@version": "1",
"beat": {
"name": "My-MacBook-Pro.local",
"hostname": "My-MacBook-Pro.local",
"version": "6.0.0"
},
"host": "My-MacBook-Pro.local",
"httpversion": "1.1",
"timestamp": "04/Jan/2015:05:13:45 +0000"
}
},
...
--------------------------------------------------------------------------------
Try another search for the geographic information derived from the IP address.
Replace $DATE with the current date, in YYYY.MM.DD format:
[source,shell]
--------------------------------------------------------------------------------
curl -XGET 'localhost:9200/logstash-$DATE/_search?pretty&q=geoip.city_name=Buffalo'
--------------------------------------------------------------------------------
A few log entries come from Buffalo, so the query produces the following response:
[source,json]
--------------------------------------------------------------------------------
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.6390574,
"hits": [
{
"_index": "logstash-2017.11.09",
"_type": "doc",
"_id": "L4zDnl8BW52sR0fx5whY",
"_score": 2.6390574,
"_source": {
"request": "/blog/geekery/disabling-battery-in-ubuntu-vms.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29",
"agent": """"Tiny Tiny RSS/1.11 (http://tt-rss.org/)"""",
"geoip": {
"timezone": "America/New_York",
"ip": "198.46.149.143",
"latitude": 42.8864,
"continent_code": "NA",
"city_name": "Buffalo",
"country_name": "United States",
"country_code2": "US",
"dma_code": 514,
"country_code3": "US",
"region_name": "New York",
"location": {
"lon": -78.8781,
"lat": 42.8864
},
"postal_code": "14202",
"region_code": "NY",
"longitude": -78.8781
},
"offset": 22795,
"auth": "-",
"ident": "-",
"verb": "GET",
"prospector": {
"type": "log"
},
"source": "/path/to/file/logstash-tutorial.log",
"message": """198.46.149.143 - - [04/Jan/2015:05:29:13 +0000] "GET /blog/geekery/disabling-battery-in-ubuntu-vms.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+semicomplete%2Fmain+%28semicomplete.com+-+Jordan+Sissel%29 HTTP/1.1" 200 9316 "-" "Tiny Tiny RSS/1.11 (http://tt-rss.org/)"""",
"tags": [
"beats_input_codec_plain_applied"
],
"referrer": """"-"""",
"@timestamp": "2017-11-09T03:11:35.321Z",
"response": "200",
"bytes": "9316",
"clientip": "198.46.149.143",
"@version": "1",
"beat": {
"name": "My-MacBook-Pro.local",
"hostname": "My-MacBook-Pro.local",
"version": "6.0.0"
},
"host": "My-MacBook-Pro.local",
"httpversion": "1.1",
"timestamp": "04/Jan/2015:05:29:13 +0000"
}
},
...
--------------------------------------------------------------------------------
If you are using Kibana to visualize your data, you can also explore the Filebeat data in Kibana:
image::static/images/kibana-filebeat-data.png[Discovering Filebeat data in Kibana]
See the {filebeat-ref}/filebeat-getting-started.html[Filebeat getting started docs] for info about loading the Kibana
index pattern for Filebeat.
You've successfully created a pipeline that uses Filebeat to take Apache web logs as input, parses those logs to
create specific, named fields from the logs, and writes the parsed data to an Elasticsearch cluster. Next, you
learn how to create a pipeline that uses multiple input and output plugins.
[[multiple-input-output-plugins]]
=== Stitching Together Multiple Input and Output Plugins
The information you need to manage often comes from several disparate sources, and use cases can require multiple
destinations for your data. Your Logstash pipeline can use multiple input and output plugins to handle these
requirements.
In this section, you create a Logstash pipeline that takes input from a Twitter feed and the Filebeat client, then
sends the information to an Elasticsearch cluster as well as writing the information directly to a file.
[float]
[[twitter-configuration]]
==== Reading from a Twitter Feed
To add a Twitter feed, you use the {logstash-ref}/plugins-inputs-twitter.html[`twitter`] input plugin. To
configure the plugin, you need several pieces of information:
* A _consumer key_, which uniquely identifies your Twitter app.
* A _consumer secret_, which serves as the password for your Twitter app.
* One or more _keywords_ to search in the incoming feed. The example shows using "cloud" as a keyword, but you can use whatever you want.
* An _oauth token_, which identifies the Twitter account using this app.
* An _oauth token secret_, which serves as the password of the Twitter account.
Visit https://dev.twitter.com/apps[https://dev.twitter.com/apps] to set up a Twitter account and generate your consumer
key and secret, as well as your access token and secret. See the docs for the {logstash-ref}/plugins-inputs-twitter.html[`twitter`] input plugin if you're not sure how to generate these keys.
Like you did earlier when you worked on <<advanced-pipeline>>, create a config file (called `second-pipeline.conf`) that
contains the skeleton of a configuration pipeline. If you want, you can reuse the file you created earlier, but make
sure you pass in the correct config file name when you run Logstash.
Add the following lines to the `input` section of the `second-pipeline.conf` file, substituting your values for the
placeholder values shown here:
[source,json]
--------------------------------------------------------------------------------
twitter {
consumer_key => "enter_your_consumer_key_here"
consumer_secret => "enter_your_secret_here"
keywords => ["cloud"]
oauth_token => "enter_your_access_token_here"
oauth_token_secret => "enter_your_access_token_secret_here"
}
--------------------------------------------------------------------------------
[float]
[[configuring-lsf]]
==== Configuring Filebeat to Send Log Lines to Logstash
As you learned earlier in <<configuring-filebeat>>, the https://github.com/elastic/beats/tree/master/filebeat[Filebeat]
client is a lightweight, resource-friendly tool that collects logs from files on the server and forwards these logs to your
Logstash instance for processing.
After installing Filebeat, you need to configure it. Open the `filebeat.yml` file located in your Filebeat installation
directory, and replace the contents with the following lines. Make sure `paths` points to your syslog:
[source,shell]
--------------------------------------------------------------------------------
filebeat.prospectors:
- type: log
paths:
- /var/log/*.log <1>
fields:
type: syslog <2>
output.logstash:
hosts: ["localhost:5044"]
--------------------------------------------------------------------------------
<1> Absolute path to the file or files that Filebeat processes.
<2> Adds a field called `type` with the value `syslog` to the event.
Save your changes.
To keep the configuration simple, you won't specify TLS/SSL settings as you would in a real world
scenario.
Configure your Logstash instance to use the Filebeat input plugin by adding the following lines to the `input` section
of the `second-pipeline.conf` file:
[source,json]
--------------------------------------------------------------------------------
beats {
port => "5044"
}
--------------------------------------------------------------------------------
[float]
[[logstash-file-output]]
==== Writing Logstash Data to a File
You can configure your Logstash pipeline to write data directly to a file with the
{logstash-ref}/plugins-outputs-file.html[`file`] output plugin.
Configure your Logstash instance to use the `file` output plugin by adding the following lines to the `output` section
of the `second-pipeline.conf` file:
[source,json]
--------------------------------------------------------------------------------
file {
path => "/path/to/target/file"
}
--------------------------------------------------------------------------------
[float]
[[multiple-es-nodes]]
==== Writing to Multiple Elasticsearch Nodes
Writing to multiple Elasticsearch nodes lightens the resource demands on a given Elasticsearch node, as well as
providing redundant points of entry into the cluster when a particular node is unavailable.
To configure your Logstash instance to write to multiple Elasticsearch nodes, edit the `output` section of the `second-pipeline.conf` file to read:
[source,json]
--------------------------------------------------------------------------------
output {
elasticsearch {
hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
}
}
--------------------------------------------------------------------------------
Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the `hosts`
parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses. Also note that
the default port for Elasticsearch is `9200` and can be omitted in the configuration above.
[float]
[[testing-second-pipeline]]
===== Testing the Pipeline
At this point, your `second-pipeline.conf` file looks like this:
[source,json]
--------------------------------------------------------------------------------
input {
twitter {
consumer_key => "enter_your_consumer_key_here"
consumer_secret => "enter_your_secret_here"
keywords => ["cloud"]
oauth_token => "enter_your_access_token_here"
oauth_token_secret => "enter_your_access_token_secret_here"
}
beats {
port => "5044"
}
}
output {
elasticsearch {
hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
}
file {
path => "/path/to/target/file"
}
}
--------------------------------------------------------------------------------
Logstash is consuming data from the Twitter feed you configured, receiving data from Filebeat, and
indexing this information to three nodes in an Elasticsearch cluster as well as writing to a file.
At the data source machine, run Filebeat with the following command:
[source,shell]
--------------------------------------------------------------------------------
sudo ./filebeat -e -c filebeat.yml -d "publish"
--------------------------------------------------------------------------------
Filebeat will attempt to connect on port 5044. Until Logstash starts with an active Beats plugin, there
wont be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.
To verify your configuration, run the following command:
[source,shell]
--------------------------------------------------------------------------------
bin/logstash -f second-pipeline.conf --config.test_and_exit
--------------------------------------------------------------------------------
The `--config.test_and_exit` option parses your configuration file and reports any errors. When the configuration file
passes the configuration test, start Logstash with the following command:
[source,shell]
--------------------------------------------------------------------------------
bin/logstash -f second-pipeline.conf
--------------------------------------------------------------------------------
Use the `grep` utility to search in the target file to verify that information is present:
[source,shell]
--------------------------------------------------------------------------------
grep syslog /path/to/target/file
--------------------------------------------------------------------------------
Run an Elasticsearch query to find the same information in the Elasticsearch cluster:
[source,shell]
--------------------------------------------------------------------------------
curl -XGET 'localhost:9200/logstash-$DATE/_search?pretty&q=fields.type:syslog'
--------------------------------------------------------------------------------
Replace $DATE with the current date, in YYYY.MM.DD format.
To see data from the Twitter feed, try this query:
[source,shell]
--------------------------------------------------------------------------------
curl -XGET 'http://localhost:9200/logstash-$DATE/_search?pretty&q=client:iphone'
--------------------------------------------------------------------------------
Again, remember to replace $DATE with the current date, in YYYY.MM.DD format.