Migrated documentation into the main repo

2025-04-24 23:27:25 -04:00 · 2013-08-29 01:24:34 +02:00 · 2013-08-29 01:24:34 +02:00 · 822043347e
commit 822043347e
parent b9558edeff
316 changed files with 23987 additions and 0 deletions
--- a/docs/reference/modules/cluster.asciidoc
+++ b/docs/reference/modules/cluster.asciidoc
@ -0,0 +1,230 @@
+[[modules-cluster]]
+== Cluster
+
+[float]
+=== Shards Allocation
+
+Shards allocation is the process of allocating shards to nodes. This can
+happen during initial recovery, replica allocation, rebalancing, or
+handling nodes being added or removed.
+
+The following settings may be used:
+
+`cluster.routing.allocation.allow_rebalance`::
+        Allow to control when rebalancing will happen based on the total 
+        state of all the indices shards in the cluster. `always`, 
+        `indices_primaries_active`, and `indices_all_active` are allowed, 
+        defaulting to `indices_all_active` to reduce chatter during 
+        initial recovery.
+
+
+`cluster.routing.allocation.cluster_concurrent_rebalance`::
+      Allow to control how many concurrent rebalancing of shards are 
+      allowed cluster wide, and default it to `2`.
+
+
+`cluster.routing.allocation.node_initial_primaries_recoveries`::
+       Allow to control specifically the number of initial recoveries 
+       of primaries that are allowed per node. Since most times local 
+       gateway is used, those should be fast and we can handle more of 
+       those per node without creating load.
+
+
+`cluster.routing.allocation.node_concurrent_recoveries`::
+     How many concurrent recoveries are allowed to happen on a node. 
+     Defaults to `2`.
+
+
+`cluster.routing.allocation.disable_new_allocation`::
+       Allows to disable new primary allocations. Note, this will prevent 
+       allocations for newly created indices. This setting really make 
+       sense when dynamically updating it using the cluster update 
+       settings API.
+
+
+`cluster.routing.allocation.disable_allocation`::
+        Allows to disable either primary or replica allocation (does not 
+        apply to newly created primaries, see `disable_new_allocation` 
+        above). Note, a replica will still be promoted to primary if 
+        one does not exist. This setting really make sense when 
+        dynamically updating it using the cluster update settings API.
+
+
+`cluster.routing.allocation.disable_replica_allocation`::
+      Allows to disable only replica allocation. Similar to the previous 
+      setting, mainly make sense when using it dynamically using the 
+      cluster update settings API.
+
+
+`indices.recovery.concurrent_streams`::
+       The number of streams to open (on a *node* level) to recover a 
+       shard from a peer shard. Defaults to `3`. 
+
+[float]
+=== Shard Allocation Awareness
+
+Cluster allocation awareness allows to configure shard and replicas
+allocation across generic attributes associated the nodes. Lets explain
+it through an example:
+
+Assume we have several racks. When we start a node, we can configure an
+attribute called `rack_id` (any attribute name works), for example, here
+is a sample config:
+
+----------------------
+node.rack_id: rack_one
+----------------------
+
+The above sets an attribute called `rack_id` for the relevant node with
+a value of `rack_one`. Now, we need to configure the `rack_id` attribute
+as one of the awareness allocation attributes (set it on *all* (master
+eligible) nodes config):
+
+--------------------------------------------------------
+cluster.routing.allocation.awareness.attributes: rack_id
+--------------------------------------------------------
+
+The above will mean that the `rack_id` attribute will be used to do
+awareness based allocation of shard and its replicas. For example, lets
+say we start 2 nodes with `node.rack_id` set to `rack_one`, and deploy a
+single index with 5 shards and 1 replica. The index will be fully
+deployed on the current nodes (5 shards and 1 replica each, total of 10
+shards).
+
+Now, if we start two more nodes, with `node.rack_id` set to `rack_two`,
+shards will relocate to even the number of shards across the nodes, but,
+a shard and its replica will not be allocated in the same `rack_id`
+value.
+
+The awareness attributes can hold several values, for example:
+
+-------------------------------------------------------------
+cluster.routing.allocation.awareness.attributes: rack_id,zone
+-------------------------------------------------------------
+
+*NOTE*: When using awareness attributes, shards will not be allocated to
+nodes that don't have values set for those attributes.
+
+[float]
+=== Forced Awareness
+
+Sometimes, we know in advance the number of values an awareness
+attribute can have, and more over, we would like never to have more
+replicas then needed allocated on a specific group of nodes with the
+same awareness attribute value. For that, we can force awareness on
+specific attributes.
+
+For example, lets say we have an awareness attribute called `zone`, and
+we know we are going to have two zones, `zone1` and `zone2`. Here is how
+we can force awareness one a node:
+
+[source,js]
+-------------------------------------------------------------------
+cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
+cluster.routing.allocation.awareness.attributes: zone
+-------------------------------------------------------------------
+
+Now, lets say we start 2 nodes with `node.zone` set to `zone1` and
+create an index with 5 shards and 1 replica. The index will be created,
+but only 5 shards will be allocated (with no replicas). Only when we
+start more shards with `node.zone` set to `zone2` will the replicas be
+allocated.
+
+[float]
+==== Automatic Preference When Searching / GETing
+
+When executing a search, or doing a get, the node receiving the request
+will prefer to execute the request on shards that exists on nodes that
+have the same attribute values as the executing node.
+
+[float]
+==== Realtime Settings Update
+
+The settings can be updated using the <<cluster-update-settings,cluster update settings API>> on a live cluster.
+
+[float]
+=== Shard Allocation Filtering
+
+Allow to control allocation if indices on nodes based on include/exclude
+filters. The filters can be set both on the index level and on the
+cluster level. Lets start with an example of setting it on the cluster
+level:
+
+Lets say we have 4 nodes, each has specific attribute called `tag`
+associated with it (the name of the attribute can be any name). Each
+node has a specific value associated with `tag`. Node 1 has a setting
+`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
+
+We can create an index that will only deploy on nodes that have `tag`
+set to `value1` and `value2` by setting
+`index.routing.allocation.include.tag` to `value1,value2`. For example:
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/test/_settings -d '{
+      "index.routing.allocation.include.tag" : "value1,value2" 
+}' 
+--------------------------------------------------
+
+On the other hand, we can create an index that will be deployed on all
+nodes except for nodes with a `tag` of value `value3` by setting
+`index.routing.allocation.exclude.tag` to `value3`. For example:
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/test/_settings -d '{
+      "index.routing.allocation.exclude.tag" : "value3" 
+}' 
+--------------------------------------------------
+
+From version 0.90, `index.routing.allocation.require.*` can be used to 
+specify a number of rules, all of which MUST match in order for a shard
+to be  allocated to a node. This is in contrast to `include` which will
+include a node if ANY rule matches.
+
+The `include`, `exclude` and `require` values can have generic simple
+matching wildcards, for example, `value1*`. A special attribute name
+called `_ip` can be used to match on node ip values. In addition `_host`
+attribute can be used to match on either the node's hostname or its ip
+address.
+
+Obviously a node can have several attributes associated with it, and
+both the attribute name and value are controlled in the setting. For
+example, here is a sample of several node configurations:
+
+[source,js]
+--------------------------------------------------
+node.group1: group1_value1
+node.group2: group2_value4
+--------------------------------------------------
+
+In the same manner, `include`, `exclude` and `require` can work against
+several attributes, for example:
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/test/_settings -d '{
+    "index.routing.allocation.include.group1" : "xxx"
+    "index.routing.allocation.include.group2" : "yyy",
+    "index.routing.allocation.exclude.group3" : "zzz",
+    "index.routing.allocation.require.group4" : "aaa"
+}' 
+--------------------------------------------------
+
+The provided settings can also be updated in real time using the update
+settings API, allowing to "move" indices (shards) around in realtime.
+
+Cluster wide filtering can also be defined, and be updated in real time
+using the cluster update settings API. This setting can come in handy
+for things like decommissioning nodes (even if the replica count is set
+to 0). Here is a sample of how to decommission a node based on `_ip`
+address:
+
+[source,js]
+--------------------------------------------------
+curl -XPUT localhost:9200/_cluster/settings -d '{
+    "transient" : {
+        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
+    } 
+}' 
+--------------------------------------------------
--- a/docs/reference/modules/discovery.asciidoc
+++ b/docs/reference/modules/discovery.asciidoc
@ -0,0 +1,26 @@
+[[modules-discovery]]
+== Discovery
+
+The discovery module is responsible for discovering nodes within a
+cluster, as well as electing a master node.
+
+Note, ElasticSearch is a peer to peer based system, nodes communicate
+with one another directly if operations are delegated / broadcast. All
+the main APIs (index, delete, search) do not communicate with the master
+node. The responsibility of the master node is to maintain the global
+cluster state, and act if nodes join or leave the cluster by reassigning
+shards. Each time a cluster state is changed, the state is made known to
+the other nodes in the cluster (the manner depends on the actual
+discovery implementation).
+
+[float]
+=== Settings
+
+The `cluster.name` allows to create separated clusters from one another.
+The default value for the cluster name is `elasticsearch`, though it is
+recommended to change this to reflect the logical group name of the
+cluster running.
+
+include::discovery/ec2.asciidoc[]
+
+include::discovery/zen.asciidoc[]
--- a/docs/reference/modules/discovery/ec2.asciidoc
+++ b/docs/reference/modules/discovery/ec2.asciidoc
@ -0,0 +1,82 @@
+[[modules-discovery-ec2]]
+=== EC2 Discovery
+
+EC2 discovery allows to use the EC2 APIs to perform automatic discovery
+(similar to multicast in non hostile multicast environments). Here is a
+simple sample configuration:
+
+[source,js]
+--------------------------------------------------
+cloud:
+    aws:
+        access_key: AKVAIQBF2RECL7FJWGJQ
+        secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
+
+discovery:
+    type: ec2
+--------------------------------------------------
+
+You'll need to install the `cloud-aws` plugin. Please check the
+https://github.com/elasticsearch/elasticsearch-cloud-aws[plugin website]
+to find the most up-to-date version to install before (re)starting
+elasticsearch.
+
+The following are a list of settings (prefixed with `discovery.ec2`)
+that can further control the discovery:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`groups` |Either a comma separated list or array based list of
+(security) groups. Only instances with the provided security groups will
+be used in the cluster discovery.
+
+|`host_type` |The type of host type to use to communicate with other
+instances. Can be one of `private_ip`, `public_ip`, `private_dns`,
+`public_dns`. Defaults to `private_ip`.
+
+|`availability_zones` |Either a comma separated list or array based list
+of availability zones. Only instances within the provided availability
+zones will be used in the cluster discovery.
+
+|`any_group` |If set to `false`, will require all security groups to be
+present for the instance to be used for the discovery. Defaults to
+`true`.
+
+|`ping_timeout` |How long to wait for existing EC2 nodes to reply during
+discovery. Defaults to 3s.
+|=======================================================================
+
+[float]
+==== Filtering by Tags
+
+EC2 discovery can also filter machines to include in the cluster based
+on tags (and not just groups). The settings to use include the
+`discovery.ec2.tag.` prefix. For example, setting
+`discovery.ec2.tag.stage` to `dev` will only filter instances with a tag
+key set to `stage`, and a value of `dev`. Several tags set will require
+all of those tags to be set for the instance to be included.
+
+One practical use for tag filtering is when an EC2 cluster contains many
+nodes that are not running elasticsearch. In this case (particularly
+with high `ping_timeout` values) there is a risk that a new node's
+discovery phase will end before it has found the cluster (which will
+result in it declaring itself master of a new cluster with the same name
+- highly undesirable). Tagging elasticsearch EC2 nodes and then
+filtering by that tag will resolve this issue.
+
+[float]
+==== Region
+
+The `cloud.aws.region` can be set to a region and will automatically use
+the relevant settings for both `ec2` and `s3`. The available values are:
+`us-east-1`, `us-west-1`, `ap-southeast-1`, `eu-west-1`.
+
+[float]
+==== Automatic Node Attributes
+
+Though not dependent on actually using `ec2` as discovery (but still
+requires the cloud aws plugin installed), the plugin can automatically
+add node attributes relating to EC2 (for example, availability zone,
+that can be used with the awareness allocation feature). In order to
+enable it, set `cloud.node.auto_attributes` to `true` in the settings.
--- a/docs/reference/modules/discovery/zen.asciidoc
+++ b/docs/reference/modules/discovery/zen.asciidoc
@ -0,0 +1,145 @@
+[[modules-discovery-zen]]
+=== Zen Discovery
+
+The zen discovery is the built in discovery module for elasticsearch and
+the default. It provides both multicast and unicast discovery as well
+being easily extended to support cloud environments.
+
+The zen discovery is integrated with other modules, for example, all
+communication between nodes is done using the
+<<modules-transport,transport>> module.
+
+It is separated into several sub modules, which are explained below:
+
+[float]
+==== Ping
+
+This is the process where a node uses the discovery mechanisms to find
+other nodes. There is support for both multicast and unicast based
+discovery (can be used in conjunction as well).
+
+[float]
+===== Multicast
+
+Multicast ping discovery of other nodes is done by sending one or more
+multicast requests where existing nodes that exists will receive and
+respond to. It provides the following settings with the
+`discovery.zen.ping.multicast` prefix:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`group` |The group address to use. Defaults to `224.2.2.4`.
+
+|`port` |The port to use. Defaults to `54328`.
+
+|`ttl` |The ttl of the multicast message. Defaults to `3`.
+
+|`address` |The address to bind to, defaults to `null` which means it
+will bind to all available network interfaces.
+|=======================================================================
+
+Multicast can be disabled by setting `multicast.enabled` to `false`.
+
+[float]
+===== Unicast
+
+The unicast discovery allows to perform the discovery when multicast is
+not enabled. It basically requires a list of hosts to use that will act
+as gossip routers. It provides the following settings with the
+`discovery.zen.ping.unicast` prefix:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`hosts` |Either an array setting or a comma delimited setting. Each
+value is either in the form of `host:port`, or in the form of
+`host[port1-port2]`.
+|=======================================================================
+
+The unicast discovery uses the
+<<modules-transport,transport>> module to
+perform the discovery.
+
+[float]
+==== Master Election
+
+As part of the initial ping process a master of the cluster is either
+elected or joined to. This is done automatically. The
+`discovery.zen.ping_timeout` (which defaults to `3s`) allows to
+configure the election to handle cases of slow or congested networks
+(higher values assure less chance of failure). Note, this setting was
+changed from 0.15.1 onwards, prior it was called
+`discovery.zen.initial_ping_timeout`.
+
+Nodes can be excluded from becoming a master by setting `node.master` to
+`false`. Note, once a node is a client node (`node.client` set to
+`true`), it will not be allowed to become a master (`node.master` is
+automatically set to `false`).
+
+The `discovery.zen.minimum_master_nodes` allows to control the minimum
+number of master eligible nodes a node should "see" in order to operate
+within the cluster. Its recommended to set it to a higher value than 1
+when running more than 2 nodes in the cluster.
+
+[float]
+==== Fault Detection
+
+There are two fault detection processes running. The first is by the
+master, to ping all the other nodes in the cluster and verify that they
+are alive. And on the other end, each node pings to master to verify if
+its still alive or an election process needs to be initiated.
+
+The following settings control the fault detection process using the
+`discovery.zen.fd` prefix:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`ping_interval` |How often a node gets pinged. Defaults to `1s`.
+
+|`ping_timeout` |How long to wait for a ping response, defaults to
+`30s`.
+
+|`ping_retries` |How many ping failures / timeouts cause a node to be
+considered failed. Defaults to `3`.
+|=======================================================================
+
+[float]
+==== External Multicast
+
+The multicast discovery also supports external multicast requests to
+discover nodes. The external client can send a request to the multicast
+IP/group and port, in the form of:
+
+[source,js]
+--------------------------------------------------
+{
+    "request" : {
+        "cluster_name": "test_cluster"
+    }
+}
+--------------------------------------------------
+
+And the response will be similar to node info response (with node level
+information only, including transport/http addresses, and node
+attributes):
+
+[source,js]
+--------------------------------------------------
+{
+    "response" : {
+        "cluster_name" : "test_cluster",
+        "transport_address" : "...",
+        "http_address" : "...",
+        "attributes" : {
+            "..."
+        }
+    }
+}
+--------------------------------------------------
+
+Note, it can still be enabled, with disabled internal multicast
+discovery, but still have external discovery working by keeping
+`discovery.zen.ping.multicast.enabled` set to `true` (the default), but,
+setting `discovery.zen.ping.multicast.ping.enabled` to `false`.
--- a/docs/reference/modules/gateway.asciidoc
+++ b/docs/reference/modules/gateway.asciidoc
@ -0,0 +1,74 @@
+[[modules-gateway]]
+== Gateway
+
+The gateway module allows one to store the state of the cluster meta
+data across full cluster restarts. The cluster meta data mainly holds
+all the indices created with their respective (index level) settings and
+explicit type mappings.
+
+Each time the cluster meta data changes (for example, when an index is
+added or deleted), those changes will be persisted using the gateway.
+When the cluster first starts up, the state will be read from the
+gateway and applied.
+
+The gateway set on the node level will automatically control the index
+gateway that will be used. For example, if the `fs` gateway is used,
+then automatically, each index created on the node will also use its own
+respective index level `fs` gateway. In this case, if an index should
+not persist its state, it should be explicitly set to `none` (which is
+the only other value it can be set to).
+
+The default gateway used is the
+<<modules-gateway-local,local>> gateway.
+
+[float]
+=== Recovery After Nodes / Time
+
+In many cases, the actual cluster meta data should only be recovered
+after specific nodes have started in the cluster, or a timeout has
+passed. This is handy when restarting the cluster, and each node local
+index storage still exists to be reused and not recovered from the
+gateway (which reduces the time it takes to recover from the gateway).
+
+The `gateway.recover_after_nodes` setting (which accepts a number)
+controls after how many data and master eligible nodes within the
+cluster recovery will start. The `gateway.recover_after_data_nodes` and
+`gateway.recover_after_master_nodes` setting work in a similar fashion,
+except they consider only the number of data nodes and only the number
+of master nodes respectively. The `gateway.recover_after_time` setting
+(which accepts a time value) sets the time to wait till recovery happens
+once all `gateway.recover_after...nodes` conditions are met.
+
+The `gateway.expected_nodes` allows to set how many data and master
+eligible nodes are expected to be in the cluster, and once met, the
+`recover_after_time` is ignored and recovery starts. The
+`gateway.expected_data_nodes` and `gateway.expected_master_nodes`
+settings are also supported. For example setting:
+
+[source,js]
+--------------------------------------------------
+gateway:
+    recover_after_nodes: 1
+    recover_after_time: 5m
+    expected_nodes: 2
+--------------------------------------------------
+
+In an expected 2 nodes cluster will cause recovery to start 5 minutes
+after the first node is up, but once there are 2 nodes in the cluster,
+recovery will begin immediately (without waiting).
+
+Note, once the meta data has been recovered from the gateway (which
+indices to create, mappings and so on), then this setting is no longer
+effective until the next full restart of the cluster.
+
+Operations are blocked while the cluster meta data has not been
+recovered in order not to mix with the actual cluster meta data that
+will be recovered once the settings has been reached.
+
+include::gateway/local.asciidoc[]
+
+include::gateway/fs.asciidoc[]
+
+include::gateway/hadoop.asciidoc[]
+
+include::gateway/s3.asciidoc[]
--- a/docs/reference/modules/gateway/fs.asciidoc
+++ b/docs/reference/modules/gateway/fs.asciidoc
@ -0,0 +1,39 @@
+[[modules-gateway-fs]]
+=== Shared FS Gateway
+
+*The shared FS gateway is deprecated and will be removed in a future
+version. Please use the
+<<modules-gateway-local,local gateway>>
+instead.*
+
+The file system based gateway stores the cluster meta data and indices
+in a *shared* file system. Note, since it is a distributed system, the
+file system should be shared between all different nodes. Here is an
+example config to enable it:
+
+[source,js]
+--------------------------------------------------
+gateway:
+    type: fs
+--------------------------------------------------
+
+[float]
+==== location
+
+The location where the gateway stores the cluster state can be set using
+the `gateway.fs.location` setting. By default, it will be stored under
+the `work` directory. Note, the `work` directory is considered a
+temporal directory with ElasticSearch (meaning it is safe to `rm -rf`
+it), the default location of the persistent gateway in work intentional,
+*it should be changed*.
+
+When explicitly specifying the `gateway.fs.location`, each node will
+append its `cluster.name` to the provided location. It means that the
+location provided can safely support several clusters.
+
+[float]
+==== concurrent_streams
+
+The `gateway.fs.concurrent_streams` allow to throttle the number of
+streams (per node) opened against the shared gateway performing the
+snapshot operation. It defaults to `5`.
--- a/docs/reference/modules/gateway/hadoop.asciidoc
+++ b/docs/reference/modules/gateway/hadoop.asciidoc
@ -0,0 +1,36 @@
+[[modules-gateway-hadoop]]
+=== Hadoop Gateway
+
+*The hadoop gateway is deprecated and will be removed in a future
+version. Please use the
+<<modules-gateway-local,local gateway>>
+instead.*
+
+The hadoop (HDFS) based gateway stores the cluster meta and indices data
+in hadoop. Hadoop support is provided as a plugin and installing is
+explained https://github.com/elasticsearch/elasticsearch-hadoop[here] or
+downloading the hadoop plugin and placing it under the `plugins`
+directory. Here is an example config to enable it:
+
+[source,js]
+--------------------------------------------------
+gateway:
+    type: hdfs
+    hdfs:
+        uri: hdfs://myhost:8022
+--------------------------------------------------
+
+[float]
+==== Settings
+
+The hadoop gateway requires two simple settings. The `gateway.hdfs.uri`
+controls the URI to connect to the hadoop cluster, for example:
+`hdfs://myhost:8022`. The `gateway.hdfs.path` controls the path under
+which the gateway will store the data.
+
+[float]
+==== concurrent_streams
+
+The `gateway.hdfs.concurrent_streams` allow to throttle the number of
+streams (per node) opened against the shared gateway performing the
+snapshot operation. It defaults to `5`.
--- a/docs/reference/modules/gateway/local.asciidoc
+++ b/docs/reference/modules/gateway/local.asciidoc
@ -0,0 +1,31 @@
+[[modules-gateway-local]]
+=== Local Gateway
+
+The local gateway allows for recovery of the full cluster state and
+indices from the local storage of each node, and does not require a
+common node level shared storage.
+
+Note, different from shared gateway types, the persistency to the local
+gateway is *not* done in an async manner. Once an operation is
+performed, the data is there for the local gateway to recover it in case
+of full cluster failure.
+
+It is important to configure the `gateway.recover_after_nodes` setting
+to include most of the expected nodes to be started after a full cluster
+restart. This will insure that the latest cluster state is recovered.
+For example:
+
+[source,js]
+--------------------------------------------------
+gateway:
+    recover_after_nodes: 1
+    recover_after_time: 5m
+    expected_nodes: 2
+--------------------------------------------------
+
+Note, to backup/snapshot the full cluster state it is recommended that
+the local storage for all nodes be copied (in theory not all are
+required, just enough to guarantee a copy of each shard has been copied,
+i.e. depending on the replication settings) while disabling flush.
+Shared storage such as S3 can be used to keep the different nodes'
+copies in one place, though it does comes at a price of more IO.
--- a/docs/reference/modules/gateway/s3.asciidoc
+++ b/docs/reference/modules/gateway/s3.asciidoc
@ -0,0 +1,51 @@
+[[modules-gateway-s3]]
+=== S3 Gateway
+
+*The S3 gateway is deprecated and will be removed in a future version.
+Please use the <<modules-gateway-local,local
+gateway>> instead.*
+
+S3 based gateway allows to do long term reliable async persistency of
+the cluster state and indices directly to Amazon S3. Here is how it can
+be configured:
+
+[source,js]
+--------------------------------------------------
+cloud:
+    aws:
+        access_key: AKVAIQBF2RECL7FJWGJQ
+        secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
+
+
+gateway:
+    type: s3
+    s3:
+        bucket: bucket_name
+--------------------------------------------------
+
+Youâ€™ll need to install the `cloud-aws` plugin, by running
+`bin/plugin install cloud-aws` before (re)starting elasticsearch.
+
+The following are a list of settings (prefixed with `gateway.s3`) that
+can further control the S3 gateway:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`chunk_size` |Big files are broken down into chunks (to overcome AWS 5g
+limit and use concurrent snapshotting). Default set to `100m`.
+|=======================================================================
+
+[float]
+==== concurrent_streams
+
+The `gateway.s3.concurrent_streams` allow to throttle the number of
+streams (per node) opened against the shared gateway performing the
+snapshot operation. It defaults to `5`.
+
+[float]
+==== Region
+
+The `cloud.aws.region` can be set to a region and will automatically use
+the relevant settings for both `ec2` and `s3`. The available values are:
+`us-east-1`, `us-west-1`, `ap-southeast-1`, `eu-west-1`.
--- a/docs/reference/modules/http.asciidoc
+++ b/docs/reference/modules/http.asciidoc
@ -0,0 +1,51 @@
+[[modules-http]]
+== HTTP
+
+The http module allows to expose *elasticsearch* APIs
+over HTTP.
+
+The http mechanism is completely asynchronous in nature, meaning that
+there is no blocking thread waiting for a response. The benefit of using
+asynchronous communication for HTTP is solving the
+http://en.wikipedia.org/wiki/C10k_problem[C10k problem].
+
+When possible, consider using
+http://en.wikipedia.org/wiki/Keepalive#HTTP_Keepalive[HTTP keep alive]
+when connecting for better performance and try to get your favorite
+client not to do
+http://en.wikipedia.org/wiki/Chunked_transfer_encoding[HTTP chunking].
+
+[float]
+=== Settings
+
+The following are the settings the can be configured for HTTP:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`http.port` |A bind port range. Defaults to `9200-9300`.
+
+|`http.max_content_length` |The max content of an HTTP request. Defaults
+to `100mb`
+
+|`http.max_initial_line_length` |The max length of an HTTP URL. Defaults
+to `4kb`
+
+|`http.compression` |Support for compression when possible (with
+Accept-Encoding). Defaults to `false`.
+
+|`http.compression_level` |Defines the compression level to use.
+Defaults to `6`.
+|=======================================================================
+
+It also shares the uses the common
+<<modules-network,network settings>>.
+
+[float]
+=== Disable HTTP
+
+The http module can be completely disabled and not started by setting
+`http.enabled` to `false`. This make sense when creating non
+<<modules-node,data nodes>> which accept HTTP
+requests, and communicate with data nodes using the internal
+<<modules-transport,transport>>.
--- a/docs/reference/modules/indices.asciidoc
+++ b/docs/reference/modules/indices.asciidoc
@ -0,0 +1,75 @@
+[[modules-indices]]
+== Indices
+
+The indices module allow to control settings that are globally managed
+for all indices.
+
+[float]
+=== Indexing Buffer
+
+The indexing buffer setting allows to control how much memory will be
+allocated for the indexing process. It is a global setting that bubbles
+down to all the different shards allocated on a specific node.
+
+The `indices.memory.index_buffer_size` accepts either a percentage or a
+byte size value. It defaults to `10%`, meaning that `10%` of the total
+memory allocated to a node will be used as the indexing buffer size.
+This amount is then divided between all the different shards. Also, if
+percentage is used, allow to set `min_index_buffer_size` (defaults to
+`48mb`) and `max_index_buffer_size` which by default is unbounded.
+
+The `indices.memory.min_shard_index_buffer_size` allows to set a hard
+lower limit for the memory allocated per shard for its own indexing
+buffer. It defaults to `4mb`.
+
+[float]
+=== TTL interval
+
+You can dynamically set the `indices.ttl.interval` allows to set how
+often expired documents will be automatically deleted. The default value
+is 60s.
+
+The deletion orders are processed by bulk. You can set
+`indices.ttl.bulk_size` to fit your needs. The default value is 10000.
+
+See also <<mapping-ttl-field>>.
+
+[float]
+=== Recovery
+
+The following settings can be set to manage recovery policy:
+
+[horizontal]
+`indices.recovery.concurrent_streams`::
+    defaults to `3`.
+
+`indices.recovery.file_chunk_size`::
+    defaults to `512kb`.
+
+`indices.recovery.translog_ops`::
+    defaults to `1000`.
+
+`indices.recovery.translog_size`::
+    defaults to `512kb`.
+
+`indices.recovery.compress`::
+    defaults to `true`.
+
+`indices.recovery.max_bytes_per_sec`::
+    since 0.90.1, defaults to `20mb`.
+
+`indices.recovery.max_size_per_sec`::
+    deprecated from 0.90.1. Replaced by `indices.recovery.max_bytes_per_sec`.
+
+[float]
+=== Store level throttling
+
+The following settings can be set to control store throttling:
+
+[horizontal]
+`indices.store.throttle.type`::
+    could be `merge` (default), `not` or `all`. See <<index-modules-store>>.
+
+`indices.store.throttle.max_bytes_per_sec`::
+    defaults to `20mb`.
+
--- a/docs/reference/modules/jmx.asciidoc
+++ b/docs/reference/modules/jmx.asciidoc
@ -0,0 +1,34 @@
+[[modules-jmx]]
+== JMX
+
+[float]
+=== REMOVED AS OF v0.90
+
+Use the stats APIs instead.
+
+The JMX module exposes node information through
+http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/[JMX].
+JMX can be used by either
+http://en.wikipedia.org/wiki/JConsole[jconsole] or
+http://en.wikipedia.org/wiki/VisualVM[VisualVM].
+
+Exposed JMX data include both node level information, as well as
+instantiated index and shard on specific node. This is a work in
+progress with each version exposing more information.
+
+[float]
+=== jmx.domain
+
+The domain under which the JMX will register under can be set using
+`jmx.domain` setting. It defaults to `{elasticsearch}`.
+
+[float]
+=== jmx.create_connector
+
+An RMI connector can be started to accept JMX requests. This can be
+enabled by setting `jmx.create_connector` to `true`. An RMI connector
+does come with its own overhead, make sure you really need it.
+
+When an RMI connector is created, the `jmx.port` setting provides a port
+range setting for the ports the rmi connector can open on. By default,
+it is set to `9400-9500`.
--- a/docs/reference/modules/memcached.asciidoc
+++ b/docs/reference/modules/memcached.asciidoc
@ -0,0 +1,69 @@
+[[modules-memcached]]
+== memcached
+
+The memcached module allows to expose *elasticsearch*
+APIs over the memcached protocol (as closely
+as possible).
+
+It is provided as a plugin called `transport-memcached` and installing
+is explained
+https://github.com/elasticsearch/elasticsearch-transport-memcached[here]
+. Another option is to download the memcached plugin and placing it
+under the `plugins` directory.
+
+The memcached protocol supports both the binary and the text protocol,
+automatically detecting the correct one to use.
+
+[float]
+=== Mapping REST to Memcached Protocol
+
+Memcached commands are mapped to REST and handled by the same generic
+REST layer in elasticsearch. Here is a list of the memcached commands
+supported:
+
+[float]
+==== GET
+
+The memcached `GET` command maps to a REST `GET`. The key used is the
+URI (with parameters). The main downside is the fact that the memcached
+`GET` does not allow body in the request (and `SET` does not allow to
+return a result...). For this reason, most REST APIs (like search) allow
+to accept the "source" as a URI parameter as well.
+
+[float]
+==== SET
+
+The memcached `SET` command maps to a REST `POST`. The key used is the
+URI (with parameters), and the body maps to the REST body.
+
+[float]
+==== DELETE
+
+The memcached `DELETE` command maps to a REST `DELETE`. The key used is
+the URI (with parameters).
+
+[float]
+==== QUIT
+
+The memcached `QUIT` command is supported and disconnects the client.
+
+[float]
+=== Settings
+
+The following are the settings the can be configured for memcached:
+
+[cols="<,<",options="header",]
+|===============================================================
+|Setting |Description
+|`memcached.port` |A bind port range. Defaults to `11211-11311`.
+|===============================================================
+
+It also shares the uses the common
+<<modules-network,network settings>>.
+
+[float]
+=== Disable memcached
+
+The memcached module can be completely disabled and not started using by
+setting `memcached.enabled` to `false`. By default it is enabled once it
+is detected as a plugin.
--- a/docs/reference/modules/network.asciidoc
+++ b/docs/reference/modules/network.asciidoc
@ -0,0 +1,88 @@
+[[modules-network]]
+== Network Settings
+
+There are several modules within a Node that use network based
+configuration, for example, the
+<<modules-transport,transport>> and
+<<modules-http,http>> modules. Node level
+network settings allows to set common settings that will be shared among
+all network based modules (unless explicitly overridden in each module).
+
+The `network.bind_host` setting allows to control the host different
+network components will bind on. By default, the bind host will be
+`anyLocalAddress` (typically `0.0.0.0` or `::0`).
+
+The `network.publish_host` setting allows to control the host the node
+will publish itself within the cluster so other nodes will be able to
+connect to it. Of course, this can't be the `anyLocalAddress`, and by
+default, it will be the first non loopback address (if possible), or the
+local address.
+
+The `network.host` setting is a simple setting to automatically set both
+`network.bind_host` and `network.publish_host` to the same host value.
+
+Both settings allows to be configured with either explicit host address
+or host name. The settings also accept logical setting values explained
+in the following table:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Logical Host Setting Value |Description
+|`_local_` |Will be resolved to the local ip address.
+
+|`_non_loopback_` |The first non loopback address.
+
+|`_non_loopback:ipv4_` |The first non loopback IPv4 address.
+
+|`_non_loopback:ipv6_` |The first non loopback IPv6 address.
+
+|`_[networkInterface]_` |Resolves to the ip address of the provided
+network interface. For example `_en0_`.
+
+|`_[networkInterface]:ipv4_` |Resolves to the ipv4 address of the
+provided network interface. For example `_en0:ipv4_`.
+
+|`_[networkInterface]:ipv6_` |Resolves to the ipv6 address of the
+provided network interface. For example `_en0:ipv6_`.
+|=======================================================================
+
+When the `cloud-aws` plugin is installed, the following are also allowed
+as valid network host settings:
+
+[cols="<,<",options="header",]
+|==================================================================
+|EC2 Host Value |Description
+|`_ec2:privateIpv4_` |The private IP address (ipv4) of the machine.
+|`_ec2:privateDns_` |The private host of the machine.
+|`_ec2:publicIpv4_` |The public IP address (ipv4) of the machine.
+|`_ec2:publicDns_` |The public host of the machine.
+|`_ec2_` |Less verbose option for the private ip address.
+|`_ec2:privateIp_` |Less verbose option for the private ip address.
+|`_ec2:publicIp_` |Less verbose option for the public ip address.
+|==================================================================
+
+[float]
+=== TCP Settings
+
+Any component that uses TCP (like the HTTP, Transport and Memcached)
+share the following allowed settings:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`network.tcp.no_delay` |Enable or disable tcp no delay setting.
+Defaults to `true`.
+
+|`network.tcp.keep_alive` |Enable or disable tcp keep alive. By default
+not explicitly set.
+
+|`network.tcp.reuse_address` |Should an address be reused or not.
+Defaults to `true` on none windows machines.
+
+|`network.tcp.send_buffer_size` |The size of the tcp send buffer size
+(in size setting format). By default not explicitly set.
+
+|`network.tcp.receive_buffer_size` |The size of the tcp receive buffer
+size (in size setting format). By default not explicitly set.
+|=======================================================================
+
--- a/docs/reference/modules/node.asciidoc
+++ b/docs/reference/modules/node.asciidoc
@ -0,0 +1,32 @@
+[[modules-node]]
+== Node
+
+*elasticsearch* allows to configure a node to either be allowed to store
+data locally or not. Storing data locally basically means that shards of
+different indices are allowed to be allocated on that node. By default,
+each node is considered to be a data node, and it can be turned off by
+setting `node.data` to `false`.
+
+This is a powerful setting allowing to simply create smart load
+balancers that take part in some of different API processing. Lets take
+an example:
+
+We can start a whole cluster of data nodes which do not even start an
+HTTP transport by setting `http.enabled` to `false`. Such nodes will
+communicate with one another using the
+<<modules-transport,transport>> module. In front
+of the cluster we can start one or more "non data" nodes which will
+start with HTTP enabled. All HTTP communication will be performed
+through these "non data" nodes.
+
+The benefit of using that is first the ability to create smart load
+balancers. These "non data" nodes are still part of the cluster, and
+they redirect operations exactly to the node that holds the relevant
+data. The other benefit is the fact that for scatter / gather based
+operations (such as search), these nodes will take part of the
+processing since they will start the scatter process, and perform the
+actual gather processing.
+
+This relieves the data nodes to do the heavy duty of indexing and
+searching, without needing to process HTTP requests (parsing), overload
+the network, or perform the gather processing.
--- a/docs/reference/modules/plugins.asciidoc
+++ b/docs/reference/modules/plugins.asciidoc
@ -0,0 +1,245 @@
+[[modules-plugins]]
+== Plugins
+
+[float]
+=== Plugins
+
+Plugins are a way to enhance the basic elasticsearch functionality in a
+custom manner. They range from adding custom mapping types, custom
+analyzers (in a more built in fashion), native scripts, custom discovery
+and more.
+
+[float]
+==== Installing plugins
+
+Installing plugins can either be done manually by placing them under the
+`plugins` directory, or using the `plugin` script. Several plugins can
+be found under the https://github.com/elasticsearch[elasticsearch]
+organization in GitHub, starting with `elasticsearch-`.
+
+Starting from 0.90.2, installing plugins typically take the form of
+`plugin --install <org>/<user/component>/<version>`. The plugins will be
+automatically downloaded in this case from `download.elasticsearch.org`,
+and in case they don't exist there, from maven (central and sonatype).
+
+Note that when the plugin is located in maven central or sonatype
+repository, `<org>` is the artifact `groupId` and `<user/component>` is
+the `artifactId`.
+
+For prior version, the older form is
+`plugin -install <org>/<user/component>/<version>`
+
+A plugin can also be installed directly by specifying the URL for it,
+for example:
+`bin/plugin --url file://path/to/plugin --install plugin-name` or
+`bin/plugin -url file://path/to/plugin -install plugin-name` for older
+version.
+
+Starting from 0.90.2, for more information about plugins, you can run
+`bin/plugin -h`.
+
+[float]
+==== Site Plugins
+
+Plugins can have "sites" in them, any plugin that exists under the
+`plugins` directory with a `_site` directory, its content will be
+statically served when hitting `/_plugin/[plugin_name]/` url. Those can
+be added even after the process has started.
+
+Installed plugins that do not contain any java related content, will
+automatically be detected as site plugins, and their content will be
+moved under `_site`.
+
+The ability to install plugins from Github allows to easily install site
+plugins hosted there by downloading the actual repo, for example,
+running:
+
+[source,js]
+--------------------------------------------------
+# From 0.90.2
+bin/plugin --install mobz/elasticsearch-head
+bin/plugin --install lukas-vlcek/bigdesk
+
+# From a prior version
+bin/plugin -install mobz/elasticsearch-head
+bin/plugin -install lukas-vlcek/bigdesk
+--------------------------------------------------
+
+Will install both of those site plugins, with `elasticsearch-head`
+available under `http://localhost:9200/_plugin/head/` and `bigdesk`
+available under `http://localhost:9200/_plugin/bigdesk/`.
+
+[float]
+==== Mandatory Plugins
+
+If you rely on some plugins, you can define mandatory plugins using the
+`plugin.mandatory` attribute, for example, here is a sample config:
+
+[source,js]
+--------------------------------------------------
+plugin.mandatory: mapper-attachments,lang-groovy
+--------------------------------------------------
+
+For safety reasons, if a mandatory plugin is not installed, the node
+will not start.
+
+[float]
+==== Installed Plugins
+
+A list of the currently loaded plugins can be retrieved using the
+<<cluster-nodes-info,Node Info API>>.
+
+[float]
+=== Known Plugins
+
+[float]
+==== Analysis Plugins
+
+* https://github.com/yakaz/elasticsearch-analysis-combo/[Combo Analysis
+Plugin] (by Olivier Favre, Yakaz)
+* https://github.com/elasticsearch/elasticsearch-analysis-smartcn[Smart
+Chinese Analysis Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-analysis-icu[ICU
+Analysis plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-analysis-stempel[Stempel
+(Polish) Analysis plugin] (by elasticsearch team)
+* https://github.com/chytreg/elasticsearch-analysis-morfologik[Morfologik
+(Polish) Analysis plugin] (by chytreg)
+* https://github.com/medcl/elasticsearch-analysis-ik[IK Analysis Plugin]
+(by Medcl)
+* https://github.com/medcl/elasticsearch-analysis-mmseg[Mmseg Analysis
+Plugin] (by Medcl)
+* https://github.com/jprante/elasticsearch-analysis-hunspell[Hunspell
+Analysis Plugin] (by Jörg Prante)
+* https://github.com/elasticsearch/elasticsearch-analysis-kuromoji[Japanese
+(Kuromoji) Analysis plugin] (by elasticsearch team).
+* https://github.com/suguru/elasticsearch-analysis-japanese[Japanese
+Analysis plugin] (by suguru).
+* https://github.com/imotov/elasticsearch-analysis-morphology[Russian
+and English Morphological Analysis Plugin] (by Igor Motov)
+* https://github.com/medcl/elasticsearch-analysis-pinyin[Pinyin Analysis
+Plugin] (by Medcl)
+* https://github.com/medcl/elasticsearch-analysis-string2int[String2Integer
+Analysis Plugin] (by Medcl)
+* https://github.com/barminator/elasticsearch-analysis-annotation[Annotation
+Analysis Plugin] (by Michal Samek)
+
+[float]
+==== River Plugins
+
+* https://github.com/elasticsearch/elasticsearch-river-couchdb[CouchDB
+River Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-river-wikipedia[Wikipedia
+River Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-river-twitter[Twitter
+River Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-river-rabbitmq[RabbitMQ
+River Plugin] (by elasticsearch team)
+* https://github.com/domdorn/elasticsearch-river-activemq/[ActiveMQ
+River Plugin] (by Dominik Dorn)
+* https://github.com/albogdano/elasticsearch-river-amazonsqs[Amazon SQS
+River Plugin] (by Alex Bogdanovski)
+* https://github.com/xxBedy/elasticsearch-river-csv[CSV River Plugin]
+(by Martin Bednar)
+* http://www.pilato.fr/dropbox/[Dropbox River Plugin] (by David Pilato)
+* http://www.pilato.fr/fsriver/[FileSystem River Plugin] (by David
+Pilato)
+* https://github.com/sksamuel/elasticsearch-river-hazelcast[Hazelcast
+River Plugin] (by Steve Samuel)
+* https://github.com/jprante/elasticsearch-river-jdbc[JDBC River Plugin]
+(by Jörg Prante)
+* https://github.com/qotho/elasticsearch-river-jms[JMS River Plugin] (by
+Steve Sarandos)
+* https://github.com/tlrx/elasticsearch-river-ldap[LDAP River Plugin]
+(by Tanguy Leroux)
+* https://github.com/richardwilly98/elasticsearch-river-mongodb/[MongoDB
+River Plugin] (by Richard Louapre)
+* https://github.com/sksamuel/elasticsearch-river-neo4j[Neo4j River
+Plugin] (by Steve Samuel)
+* https://github.com/jprante/elasticsearch-river-oai/[Open Archives
+Initiative (OAI) River Plugin] (by Jörg Prante)
+* https://github.com/sksamuel/elasticsearch-river-redis[Redis River
+Plugin] (by Steve Samuel)
+* http://dadoonet.github.com/rssriver/[RSS River Plugin] (by David
+Pilato)
+* https://github.com/adamlofts/elasticsearch-river-sofa[Sofa River
+Plugin] (by adamlofts)
+* https://github.com/javanna/elasticsearch-river-solr/[Solr River
+Plugin] (by Luca Cavanna)
+* https://github.com/sunnygleason/elasticsearch-river-st9[St9 River
+Plugin] (by Sunny Gleason)
+* https://github.com/endgameinc/elasticsearch-river-kafka[Kafka River
+Plugin] (by Endgame Inc.)
+* https://github.com/obazoud/elasticsearch-river-git[Git River Plugin] (by Olivier Bazoud)
+
+[float]
+==== Transport Plugins
+
+* https://github.com/elasticsearch/elasticsearch-transport-wares[Servlet
+transport] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-transport-memcached[Memcached
+transport plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-transport-thrift[Thrift
+Transport] (by elasticsearch team)
+* https://github.com/tlrx/transport-zeromq[ZeroMQ transport layer
+plugin] (by Tanguy Leroux)
+* https://github.com/sonian/elasticsearch-jetty[Jetty HTTP transport
+plugin] (by Sonian Inc.)
+
+[float]
+==== Scripting Plugins
+
+* https://github.com/elasticsearch/elasticsearch-lang-python[Python
+language Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-lang-javascript[JavaScript
+language Plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-lang-groovy[Groovy lang
+Plugin] (by elasticsearch team)
+* https://github.com/hiredman/elasticsearch-lang-clojure[Clojure
+Language Plugin] (by Kevin Downey)
+
+[float]
+==== Site Plugins
+
+* https://github.com/lukas-vlcek/bigdesk[BigDesk Plugin] (by Lukáš Vlček)
+* https://github.com/mobz/elasticsearch-head[Elasticsearch Head Plugin]
+(by Ben Birch)
+* https://github.com/royrusso/elasticsearch-HQ[ElasticSearch HQ] (by Roy
+Russo)
+* https://github.com/karmi/elasticsearch-paramedic[Paramedic Plugin] (by
+Karel Minařík)
+* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy
+Plugin] (by Zachary Tong)
+* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor
+Plugin] (by Zachary Tong)
+* https://github.com/andrewvc/elastic-hammer[Hammer Plugin] (by Andrew
+Cholakian)
+
+[float]
+==== Misc Plugins
+
+* https://github.com/elasticsearch/elasticsearch-mapper-attachments[Mapper
+Attachments Type plugin] (by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-hadoop[Hadoop Plugin]
+(by elasticsearch team)
+* https://github.com/elasticsearch/elasticsearch-cloud-aws[AWS Cloud
+Plugin] (by elasticsearch team)
+* https://github.com/mattweber/elasticsearch-mocksolrplugin[ElasticSearch
+Mock Solr Plugin] (by Matt Weber)
+* https://github.com/spinscale/elasticsearch-suggest-plugin[Suggester
+Plugin] (by Alexander Reelsen)
+* https://github.com/medcl/elasticsearch-partialupdate[ElasticSearch
+PartialUpdate Plugin] (by Medcl)
+* https://github.com/sonian/elasticsearch-zookeeper[ZooKeeper Discovery
+Plugin] (by Sonian Inc.)
+* https://github.com/derryx/elasticsearch-changes-plugin[ElasticSearch
+Changes Plugin] (by Thomas Peuss)
+* http://tlrx.github.com/elasticsearch-view-plugin[ElasticSearch View
+Plugin] (by Tanguy Leroux)
+* https://github.com/viniciusccarvalho/elasticsearch-newrelic[ElasticSearch
+New Relic Plugin] (by Vinicius Carvalho)
+* https://github.com/endgameinc/elasticsearch-term-plugin[Terms
+Component Plugin] (by Endgame Inc.)
+* https://github.com/carrot2/elasticsearch-carrot2[carrot2 Plugin]:
+Results clustering with carrot2 (by Dawid Weiss)
+
--- a/docs/reference/modules/scripting.asciidoc
+++ b/docs/reference/modules/scripting.asciidoc
@ -0,0 +1,242 @@
+[[modules-scripting]]
+== Scripting
+
+The scripting module allows to use scripts in order to evaluate custom
+expressions. For example, scripts can be used to return "script fields"
+as part of a search request, or can be used to evaluate a custom score
+for a query and so on.
+
+The scripting module uses by default http://mvel.codehaus.org/[mvel] as
+the scripting language with some extensions. mvel is used since it is
+extremely fast and very simple to use, and in most cases, simple
+expressions are needed (for example, mathematical equations).
+
+Additional `lang` plugins are provided to allow to execute scripts in
+different languages. Currently supported plugins are `lang-javascript`
+for JavaScript, `lang-groovy` for Groovy, and `lang-python` for Python.
+All places where a `script` parameter can be used, a `lang` parameter
+(on the same level) can be provided to define the language of the
+script. The `lang` options are `mvel`, `js`, `groovy`, `python`, and
+`native`.
+
+[float]
+=== Default Scripting Language
+
+The default scripting language (assuming no `lang` parameter is
+provided) is `mvel`. In order to change it set the `script.default_lang`
+to the appropriate language.
+
+[float]
+=== Preloaded Scripts
+
+Scripts can always be provided as part of the relevant API, but they can
+also be preloaded by placing them under `config/scripts` and then
+referencing them by the script name (instead of providing the full
+script). This helps reduce the amount of data passed between the client
+and the nodes.
+
+The name of the script is derived from the hierarchy of directories it
+exists under, and the file name without the lang extension. For example,
+a script placed under `config/scripts/group1/group2/test.py` will be
+named `group1_group2_test`.
+
+[float]
+=== Native (Java) Scripts
+
+Even though `mvel` is pretty fast, allow to register native Java based
+scripts for faster execution.
+
+In order to allow for scripts, the `NativeScriptFactory` needs to be
+implemented that constructs the script that will be executed. There are
+two main types, one that extends `AbstractExecutableScript` and one that
+extends `AbstractSearchScript` (probably the one most users will extend,
+with additional helper classes in `AbstractLongSearchScript`,
+`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
+
+Registering them can either be done by settings, for example:
+`script.native.my.type` set to `sample.MyNativeScriptFactory` will
+register a script named `my`. Another option is in a plugin, access
+`ScriptModule` and call `registerScript` on it.
+
+Executing the script is done by specifying the `lang` as `native`, and
+the name of the script as the `script`.
+
+Note, the scripts need to be in the classpath of elasticsearch. One
+simple way to do it is to create a directory under plugins (choose a
+descriptive name), and place the jar / classes files there, they will be
+automatically loaded.
+
+[float]
+=== Score
+
+In all scripts that can be used in facets, allow to access the current
+doc score using `doc.score`.
+
+[float]
+=== Document Fields
+
+Most scripting revolve around the use of specific document fields data.
+The `doc['field_name']` can be used to access specific field data within
+a document (the document in question is usually derived by the context
+the script is used). Document fields are very fast to access since they
+end up being loaded into memory (all the relevant field values/tokens
+are loaded to memory).
+
+The following data can be extracted from a field:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Expression |Description
+|`doc['field_name'].value` |The native value of the field. For example,
+if its a short type, it will be short.
+
+|`doc['field_name'].values` |The native array values of the field. For
+example, if its a short type, it will be short[]. Remember, a field can
+have several values within a single doc. Returns an empty array if the
+field has no values.
+
+|`doc['field_name'].empty` |A boolean indicating if the field has no
+values within the doc.
+
+|`doc['field_name'].multiValued` |A boolean indicating that the field
+has several values within the corpus.
+
+|`doc['field_name'].lat` |The latitude of a geo point type.
+
+|`doc['field_name'].lon` |The longitude of a geo point type.
+
+|`doc['field_name'].lats` |The latitudes of a geo point type.
+
+|`doc['field_name'].lons` |The longitudes of a geo point type.
+
+|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in miles)
+of this geo point field from the provided lat/lon.
+
+|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
+miles) of this geo point field from the provided lat/lon.
+
+|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
+km) of this geo point field from the provided lat/lon.
+
+|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
+km) of this geo point field from the provided lat/lon.
+
+|`doc['field_name'].geohashDistance(geohash)` |The distance (in miles)
+of this geo point field from the provided geohash.
+
+|`doc['field_name'].geohashDistanceInKm(geohash)` |The distance (in km)
+of this geo point field from the provided geohash.
+|=======================================================================
+
+[float]
+=== Stored Fields
+
+Stored fields can also be accessed when executed a script. Note, they
+are much slower to access compared with document fields, but are not
+loaded into memory. They can be simply accessed using
+`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
+
+[float]
+=== Source Field
+
+The source field can also be accessed when executing a script. The
+source field is loaded per doc, parsed, and then provided to the script
+for evaluation. The `_source` forms the context under which the source
+field can be accessed, for example `_source.obj2.obj1.field3`.
+
+[float]
+=== mvel Built In Functions
+
+There are several built in functions that can be used within scripts.
+They include:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Function |Description
+|`time()` |The current time in milliseconds.
+
+|`sin(a)` |Returns the trigonometric sine of an angle.
+
+|`cos(a)` |Returns the trigonometric cosine of an angle.
+
+|`tan(a)` |Returns the trigonometric tangent of an angle.
+
+|`asin(a)` |Returns the arc sine of a value.
+
+|`acos(a)` |Returns the arc cosine of a value.
+
+|`atan(a)` |Returns the arc tangent of a value.
+
+|`toRadians(angdeg)` |Converts an angle measured in degrees to an
+approximately equivalent angle measured in radians
+
+|`toDegrees(angrad)` |Converts an angle measured in radians to an
+approximately equivalent angle measured in degrees.
+
+|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
+
+|`log(a)` |Returns the natural logarithm (base _e_) of a value.
+
+|`log10(a)` |Returns the base 10 logarithm of a value.
+
+|`sqrt(a)` |Returns the correctly rounded positive square root of a
+value.
+
+|`cbrt(a)` |Returns the cube root of a double value.
+
+|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
+arguments as prescribed by the IEEE 754 standard.
+
+|`ceil(a)` |Returns the smallest (closest to negative infinity) value
+that is greater than or equal to the argument and is equal to a
+mathematical integer.
+
+|`floor(a)` |Returns the largest (closest to positive infinity) value
+that is less than or equal to the argument and is equal to a
+mathematical integer.
+
+|`rint(a)` |Returns the value that is closest in value to the argument
+and is equal to a mathematical integer.
+
+|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
+rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
+
+|`pow(a, b)` |Returns the value of the first argument raised to the
+power of the second argument.
+
+|`round(a)` |Returns the closest _int_ to the argument.
+
+|`random()` |Returns a random _double_ value.
+
+|`abs(a)` |Returns the absolute value of a value.
+
+|`max(a, b)` |Returns the greater of two values.
+
+|`min(a, b)` |Returns the smaller of two values.
+
+|`ulp(d)` |Returns the size of an ulp of the argument.
+
+|`signum(d)` |Returns the signum function of the argument.
+
+|`sinh(x)` |Returns the hyperbolic sine of a value.
+
+|`cosh(x)` |Returns the hyperbolic cosine of a value.
+
+|`tanh(x)` |Returns the hyperbolic tangent of a value.
+
+|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
+or underflow.
+|=======================================================================
+
+[float]
+=== Arithmetic precision in MVEL
+
+When dividing two numbers using MVEL based scripts, the engine tries to
+be smart and adheres to the default behaviour of java. This means if you
+divide two integers (you might have configured the fields as integer in
+the mapping), the result will also be an integer. This means, if a
+calculation like `1/num` is happening in your scripts and `num` is an
+integer with the value of `8`, the result is `0` even though you were
+expecting it to be `0.125`. You may need to enforce precision by
+explicitly using a double like `1.0/num` in order to get the expected
+result.
--- a/docs/reference/modules/threadpool.asciidoc
+++ b/docs/reference/modules/threadpool.asciidoc
@ -0,0 +1,120 @@
+[[modules-threadpool]]
+== Thread Pool
+
+A node holds several thread pools in order to improve how threads are
+managed and memory consumption within a node. There are several thread
+pools, but the important ones include:
+
+[horizontal]
+`index`:: 
+    For index/delete operations, defaults to `fixed` type since
+    `0.90.0`, size `# of available processors`. (previously type `cached`)
+
+`search`:: 
+    For count/search operations, defaults to `fixed` type since
+    `0.90.0`, size `3x # of available processors`. (previously type
+    `cached`)
+
+`get`:: 
+    For get operations, defaults to `fixed` type since `0.90.0`,
+    size `# of available processors`. (previously type `cached`)
+
+`bulk`:: 
+    For bulk operations, defaults to `fixed` type since `0.90.0`,
+    size `# of available processors`. (previously type `cached`)
+
+`warmer`:: 
+    For segment warm-up operations, defaults to `scaling` since
+    `0.90.0` with a `5m` keep-alive. (previously type `cached`)
+
+`refresh`:: 
+    For refresh operations, defaults to `scaling` since
+    `0.90.0` with a `5m` keep-alive. (previously type `cached`)
+
+Changing a specific thread pool can be done by setting its type and
+specific type parameters, for example, changing the `index` thread pool
+to `blocking` type:
+
+[source,js]
+--------------------------------------------------
+threadpool:
+    index:
+        type: blocking
+        min: 1
+        size: 30
+        wait_time: 30s
+--------------------------------------------------
+
+NOTE: you can update threadpool settings live using
+      <<cluster-update-settings>>.
+
+
+[float]
+=== Thread pool types
+
+The following are the types of thread pools that can be used and their
+respective parameters:
+
+[float]
+==== `cache`
+
+The `cache` thread pool is an unbounded thread pool that will spawn a
+thread if there are pending requests. Here is an example of how to set
+it:
+
+[source,js]
+--------------------------------------------------
+threadpool:
+    index:
+        type: cached
+--------------------------------------------------
+
+[float]
+==== `fixed`
+
+The `fixed` thread pool holds a fixed size of threads to handle the
+requests with a queue (optionally bounded) for pending requests that
+have no threads to service them.
+
+The `size` parameter controls the number of threads, and defaults to the
+number of cores times 5.
+
+The `queue_size` allows to control the size of the queue of pending
+requests that have no threads to execute them. By default, it is set to
+`-1` which means its unbounded. When a request comes in and the queue is
+full, the `reject_policy` parameter can control how it will behave. The
+default, `abort`, will simply fail the request. Setting it to `caller`
+will cause the request to execute on an IO thread allowing to throttle
+the execution on the networking layer.
+
+[source,js]
+--------------------------------------------------
+threadpool:
+    index:
+        type: fixed
+        size: 30
+        queue_size: 1000
+        reject_policy: caller
+--------------------------------------------------
+
+[float]
+==== `blocking`
+
+The `blocking` pool allows to configure a `min` (defaults to `1`) and
+`size` (defaults to the number of cores times 5) parameters for the
+number of threads.
+
+It also has a backlog queue with a default `queue_size` of `1000`. Once
+the queue is full, it will wait for the provided `wait_time` (defaults
+to `60s`) on the calling IO thread, and fail if it has not been
+executed.
+
+[source,js]
+--------------------------------------------------
+threadpool:
+    index:
+        type: blocking
+        min: 1
+        size: 30
+        wait_time: 30s
+--------------------------------------------------
--- a/docs/reference/modules/thrift.asciidoc
+++ b/docs/reference/modules/thrift.asciidoc
@ -0,0 +1,25 @@
+[[modules-thrift]]
+== Thrift
+
+The thrift transport module allows to expose the REST interface of
+elasticsearch using thrift. Thrift should provide better performance
+over http. Since thrift provides both the wire protocol and the
+transport, it should make using it simpler (thought its lacking on
+docs...).
+
+Using thrift requires installing the `transport-thrift` plugin, located
+https://github.com/elasticsearch/elasticsearch-transport-thrift[here].
+
+The thrift
+https://github.com/elasticsearch/elasticsearch-transport-thrift/blob/master/elasticsearch.thrift[schema]
+can be used to generate thrift clients.
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`thrift.port` |The port to bind to. Defaults to 9500-9600
+
+|`thrift.frame` |Defaults to `-1`, which means no framing. Set to a
+higher value to specify the frame size (like `15mb`).
+|=======================================================================
+
--- a/docs/reference/modules/transport.asciidoc
+++ b/docs/reference/modules/transport.asciidoc
@ -0,0 +1,43 @@
+[[modules-transport]]
+== Transport
+
+The transport module is used for internal communication between nodes
+within the cluster. Each call that goes from one node to the other uses
+the transport module (for example, when an HTTP GET request is processed
+by one node, and should actually be processed by another node that holds
+the data).
+
+The transport mechanism is completely asynchronous in nature, meaning
+that there is no blocking thread waiting for a response. The benefit of
+using asynchronous communication is first solving the
+http://en.wikipedia.org/wiki/C10k_problem[C10k problem], as well as
+being the idle solution for scatter (broadcast) / gather operations such
+as search in ElasticSearch.
+
+[float]
+=== TCP Transport
+
+The TCP transport is an implementation of the transport module using
+TCP. It allows for the following settings:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|`transport.tcp.port` |A bind port range. Defaults to `9300-9400`.
+
+|`transport.tcp.connect_timeout` |The socket connect timeout setting (in
+time setting format). Defaults to `2s`.
+
+|`transport.tcp.compress` |Set to `true` to enable compression (LZF)
+between all nodes. Defaults to `false`.
+|=======================================================================
+
+It also shares the uses the common
+<<modules-network,network settings>>.
+
+[float]
+=== Local Transport
+
+This is a handy transport to use when running integration tests within
+the JVM. It is automatically enabled when using
+`NodeBuilder#local(true)`.