updating centralized setup to use Redis instead of AMQP

2025-04-24 06:37:19 -04:00 · 2012-07-02 13:39:46 -04:00 · 2012-07-02 13:39:46 -04:00 · 373a210f4d
commit 373a210f4d
parent 7c10696373
1 changed files with 107 additions and 115 deletions
--- a/docs/tutorials/getting-started-centralized.md
+++ b/docs/tutorials/getting-started-centralized.md
@ -24,9 +24,9 @@ On servers shipping logs:

 On the server collecting and indexing your logs:

-* Download and run elasticsearch
-* Download and run an AMQP broker
-* Download and run logstash
+* Download and run Elasticsearch
+* Download and run Redis
+* Download and run Logstash

 ## ElasticSearch

@ -47,29 +47,21 @@ for easy downloading of ElasticSearch:
 ElasticSearch requires Java (uses Lucene on the backend; if you want to know
 more read the elasticsearch docs).

-To start the service, run `bin/elasticsearch`. If you want to run it in the
-foreground, use 'bin/elasticsearch -f' 
+To start the service, run `bin/elasticsearch -f`. This will run it in the foreground. We want to keep it this way for debugging for now.

-## AMQP Broker
+## Redis

-AMQP is a standard for message-based communication. It supports
-publish-subscribe, queues, etc.  AMQP is supported way to ship your logs
-between servers with logstash. You could also use redis, xmpp, stomp, tcp,
-zeromq, or other means to transport your logs.
+Previous versions of this guide used AMQP via RabbitMQ. Due to the complexity of AMQP as well as performance issues related to the Bunny driver we use, we're now recommending Redis instead.

-If you don't know what AMQP is, that's fine, you don't need to know anything
-about it for this config. If you already have an AMQP server and know how to
-configure it, you can skip this section.
+Redis has no external dependencies and has a much simpler configuration in Logstash.

-If you don't have an AMQP server already, you might as well download
-[rabbitmq](http://www.rabbitmq.com/server.html). I recommend using the native
-packages (rpm, deb) if those are available for your system.
+Building and installing Redis is fairly straightforward. While normally this would be out of the scope of this document, as the instructions are so simple we'll include them here:

-Configuring RabbitMQ is out of scope for this doc, but know that if you use the
-RPM or Deb package you'll probably end up with a rabbitmq startup script that
-you can use, and you'll be ready to go to the next section.
+- Download Redis from http://redis.io/download (The latest stable release is like what you want)
+- Extract the source, change to the directory and run `make`
+- Run Redis with `src/redis-server`

-If you want/need to configure RabbitMQ, seek the rabbitmq docs.
+That's it.

 ## logstash

@ -83,120 +75,117 @@ requirements.
 Follow [this link to download logstash-%VERSION%](http://semicomplete.com/files/logstash/logstash-%VERSION%-monolithic.jar).

 Since we're doing a centralized configuration, you'll have two main logstash
-agent roles: a shipper and an indexer. You will ship logs from all servers to a
-single AMQP message queue and have another agent receive those messages, parse
+agent roles: a shipper and an indexer. You will ship logs from all servers via Redis and have another agent receive those messages, parse
 them, and index them in elasticsearch.

 ### logstash log shipper

-This agent you will run on all of your servers you want to collect logs on.
-Here's a good sample config:
+As with the simple example, we're going to start simple to ensure that events are flowing

    input {
-      file {
-        type => "syslog"
-
-        # Wildcards work here :)
-        path => [ "/var/log/messages", "/var/log/syslog", "/var/log/*.log" ]
-      }
-
-      file {
-        type => "apache-access"
-        path => "/var/log/apache2/access.log"
-      }
-
-      file {
-        type => "apache-error"
-        path => "/var/log/apache2/error.log"
+      stdin {
+        type => "stdin-type"
      }
    }

    output {
-      # Output events to stdout for debugging. Feel free to remove
-      # this output if you don't need it.
-      stdout { }
-
-      # Ship events to the amqp fanout exchange named 'rawlogs"
-      amqp {
-        host => "myamqpserver"
-        exchange_type => "fanout"
-        name => "rawlogs"
-      }
+      stdout { debug => true debug_format => "json"}
+      redis { host => "127.0.0.1" data_type => "list" key => "logstash" }
    }

-Put this in a file and call it 'myshipper.conf' (or anything, really), and run: 
+Put this in a file and call it 'shipper.conf' (or anything, really), and run: 

-    java -jar logstash-%VERSION%-monolithic.jar agent -f myshipper.conf
+    java -jar logstash-%VERSION%-monolithic.jar agent -f shipper.conf

-This should start tailing the file inputs specified above and ships them out
-over amqp. If you included the 'stdout' output you will see events written to
-stdout as they are found.
+This will take anything you type into this console and display it on the console. Additionally it will save events to Redis in a `list` named after the `key` value you provided.
+
+### Testing the Redis output
+To verify that the message made it into Redis, check your Redis window. You should see something like the following:
+
+    [83019] 02 Jul 12:51:02 - Accepted 127.0.0.1:58312
+    [83019] 02 Jul 12:51:06 - Client closed connection
+    [83019] 02 Jul 12:51:06 - DB 0: 1 keys (0 volatile) in 4 slots HT.
+
+The redis application ships with a CLI application that you can use to query the data. From your Redis source directory, run the following:
+
+`src/redis-cli`
+
+Once connected, run the following commands:
+
+    redis 127.0.0.1:6379> llen logstash
+    (integer) 1
+    redis 127.0.0.1:6379> lpop logstash
+    "{\"@source\":\"stdin://jvstratusmbp.local/\",\"@type\":\"stdin-type\",\"@tags\":[],\"@fields\":{},\"@timestamp\":\"2012-07-02T17:01:12.278000Z\",\"@source_host\":\"jvstratusmbp.local\",\"@source_path\":\"/\",\"@message\":\"test\"}"
+    redis 127.0.0.1:6379> llen logstash
+    (integer) 0
+    redis 127.0.0.1:6379>
+
+What we've just done is check the length of the list, read and removed the oldest item in the list, and checked the length again.
+
+This behavior is what Logstash does when it reads from a Redis input (technically logstash performs a blocking lpop). We're essentially using Redis to simulate a queue via the `list` data type.
+
+Go ahead and type a few more entries in the agent window:
+
+- test 1
+- test 2
+- test 3
+
+As you `lpop` you should get them in the correct order of insertion.

 ### logstash indexer

-This agent will parse and index your logs as they come in over AMQP. Here's a
-sample config based on the previous section.
-
-We'll use grok to parse some logs. Grok is a filter in logstash. Additionally,
-after we parse with grok, we want to take any timestamps found in the log and
-parse them to use as the real timestamp value for the event.
+This agent will parse and index your logs as they come in over Redis. Here's a
+sample config based on the previous section. Save this as `indexer.conf`

    input {
-      amqp {
-        # ship logs to the 'rawlogs' fanout queue.
-        type => "all"
-        host => "myamqpserver"
-        exchange => "rawlogs"
-        name => "rawlogs_consumer"
-      }
-    }
-
-    filter {
-      grok {
-        type => "syslog" # for logs of type "syslog"
-        pattern => "%{SYSLOGLINE}"
-        # You can specify multiple 'pattern' lines
-      }
-
-      grok {
-        type => "apache-access" # for logs of type 'apache-access'
-        pattern => "%{COMBINEDAPACHELOG}"
-      }
-
-      date {
-        type => "syslog"
-
-        # The 'timestamp' and 'timestamp8601' names are for fields in the
-        # logstash event.  The 'SYSLOGLINE' grok pattern above includes a field
-        # named 'timestamp' that is set to the normal syslog timestamp if it
-        # exists in the event.
-        timestamp => "MMM  d HH:mm:ss"   # syslog 'day' value can be space-leading
-        timestamp => "MMM dd HH:mm:ss"
-        timestamp8601 => ISO8601 # Some syslogs use ISO8601 time format
-      }
-
-      date {
-        type => "apache-access"
-        timestamp => "dd/MMM/yyyy:HH:mm:ss Z"
+      redis {
+        host => "127.0.0.1"
+        type => "redis-input"
+        # these settings should match the output of the agent
+        data_type => "list"
+        key => "logstash"
+        # We use json_event here since the sender is a logstash agent
+        message_format => "json_event"
      }
    }
    
    output {
-      stdout { }
+      stdout { debug => true debug_format => "json"}

-      # If your elasticsearch server is discoverable with multicast, use this:
-      #elasticsearch { }
-
-      # If you can't discover using multicast, set the address explicitly
      elasticsearch {
-        host => "myelasticsearchserver"
+        host => "127.0.0.1"
      }
    }

+The above configuration will attach to Redis and issue a `BLPOP` against the `logstash` list. When an event is recieved, it will be pulled off and sent to Elasticsearch for indexing.

-The above config will take raw logs in over amqp, parse them with grok and date
-filters, and index them into elasticsearch.
+Start the indexer the same way as the agent but specifying the `indexer.conf` file:

+`java -jar logstash-%VERSION%-monolithic.jar agent -f indexer.conf`
+
+To verify that your Logstash indexer is connecting to Elasticsearch properly, you should see a message in your Elasticsearch window similar to the following:
+
+`[2012-07-02 13:14:27,008][INFO ][cluster.service          ] [Baron Samedi] added {[Bes][JZQBMR21SUWRNtTMsDV3_g][inet[/192.168.1.194:9301]]{client=true, data=false},}`
+
+The names `Bes` and `Baron Samedi` may differ as ES uses random names for nodes.
+
+### Testing the flow
+Now we want to test the flow. In your agent window, type something to generate an event.
+The indexer should read this and persist it to Elasticsearch. It will also display the event to stdout.
+
+In your Elasticsearch window, you should see something like the following:
+
+    [2012-07-02 13:21:58,982][INFO ][cluster.metadata         ] [Baron Samedi] [logstash-2012.07.02] creating index, cause [auto(index api)], shards [5]/[1], mappings []
+    [2012-07-02 13:21:59,495][INFO ][cluster.metadata         ] [Baron Samedi] [logstash-2012.07.02] update_mapping [stdin-type] (dynamic)
+
+Since indexes are created dynamically, this is the first sign that Logstash was able to write to ES. Let's use curl to verify our data is there:
+Using our curl command from the simple tutorial should let us see the data:
+
+`curl -s -XGET http://localhost:9200/logstash-2012.07.02/_search?q=@type:stdin-type`
+
+You may need to modify the date as this is based on the date this guide was written.
+
+Now we can move on to the final step...
 ## logstash web interface

 Run this on the same server as your elasticsearch server.
@ -204,19 +193,22 @@ Run this on the same server as your elasticsearch server.
 To run the logstash web server, just run the jar with 'web' as the first
 argument. 

-    % java -jar logstash-%VERSION%-monolithic.jar web
-    >> Thin web server (v1.2.7 codename No Hup)
-    >> Maximum connections set to 1024
-    >> Listening on 0.0.0.0:9292, CTRL+C to stop
+    java -jar logstash-%VERSION%-monolithic.jar web --backend elasticsearch://127.0.0.1/

-Just point your browser at the http://yourserver:9292/ and start searching
+As with the indexer, you should see the Logstash web interface connection:
+
+    [2012-07-02 13:28:34,818][INFO ][cluster.service          ] [Baron Samedi] added {[Nebulon][kaO6QIojTIav2liuTjGOsA][inet[/192.168.1.194:9302]]{client=true, data=false},}
+
+Just point your browser at the http://127.0.0.1:9292/ and start searching
 logs!

-Note: If your elasticsearch server is not discoverable with multicast, you can
-specify the host explicitly using the --backend flag:
+# Distributing the load
+At this point we've been simulating a distributed environment on a single machine. If only the world were so easy.
+In all of the example configurations, we've been explicitly setting the connection to connect to `127.0.0.1` despite the fact in most network-related plugins, that's the default host.

-    % java -jar logstash-%VERSION%-monolithic.jar web --backend elasticsearch://myserver/
+Since Logstash is so modular, you can install the various components on different systems.

-If you set a cluster name in ElasticSearch (ignore this if you don't know what
-that means), you must give the cluster name to logstash as well: --backend
-elasticsearch://myserver/clustername
+- If you want to give Redis a dedicated host, simply ensure that the `host` attribute in configurations points to that host.
+- If you want to give Elasticsearch a dedicated host, simple ensure that the `host` attribute is correct as well (in both web and indexer).
+
+As with the simple input example, reading from stdin is fairly useless. Check the Logstash documentation for the various inputs offered and mix and match to taste!