Migrated documentation into the main repo

This commit is contained in:
Clinton Gormley 2013-08-29 01:24:34 +02:00
parent b9558edeff
commit 822043347e
316 changed files with 23987 additions and 0 deletions

View file

@ -0,0 +1,230 @@
[[modules-cluster]]
== Cluster
[float]
=== Shards Allocation
Shards allocation is the process of allocating shards to nodes. This can
happen during initial recovery, replica allocation, rebalancing, or
handling nodes being added or removed.
The following settings may be used:
`cluster.routing.allocation.allow_rebalance`::
Allow to control when rebalancing will happen based on the total
state of all the indices shards in the cluster. `always`,
`indices_primaries_active`, and `indices_all_active` are allowed,
defaulting to `indices_all_active` to reduce chatter during
initial recovery.
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent rebalancing of shards are
allowed cluster wide, and default it to `2`.
`cluster.routing.allocation.node_initial_primaries_recoveries`::
Allow to control specifically the number of initial recoveries
of primaries that are allowed per node. Since most times local
gateway is used, those should be fast and we can handle more of
those per node without creating load.
`cluster.routing.allocation.node_concurrent_recoveries`::
How many concurrent recoveries are allowed to happen on a node.
Defaults to `2`.
`cluster.routing.allocation.disable_new_allocation`::
Allows to disable new primary allocations. Note, this will prevent
allocations for newly created indices. This setting really make
sense when dynamically updating it using the cluster update
settings API.
`cluster.routing.allocation.disable_allocation`::
Allows to disable either primary or replica allocation (does not
apply to newly created primaries, see `disable_new_allocation`
above). Note, a replica will still be promoted to primary if
one does not exist. This setting really make sense when
dynamically updating it using the cluster update settings API.
`cluster.routing.allocation.disable_replica_allocation`::
Allows to disable only replica allocation. Similar to the previous
setting, mainly make sense when using it dynamically using the
cluster update settings API.
`indices.recovery.concurrent_streams`::
The number of streams to open (on a *node* level) to recover a
shard from a peer shard. Defaults to `3`.
[float]
=== Shard Allocation Awareness
Cluster allocation awareness allows to configure shard and replicas
allocation across generic attributes associated the nodes. Lets explain
it through an example:
Assume we have several racks. When we start a node, we can configure an
attribute called `rack_id` (any attribute name works), for example, here
is a sample config:
----------------------
node.rack_id: rack_one
----------------------
The above sets an attribute called `rack_id` for the relevant node with
a value of `rack_one`. Now, we need to configure the `rack_id` attribute
as one of the awareness allocation attributes (set it on *all* (master
eligible) nodes config):
--------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id
--------------------------------------------------------
The above will mean that the `rack_id` attribute will be used to do
awareness based allocation of shard and its replicas. For example, lets
say we start 2 nodes with `node.rack_id` set to `rack_one`, and deploy a
single index with 5 shards and 1 replica. The index will be fully
deployed on the current nodes (5 shards and 1 replica each, total of 10
shards).
Now, if we start two more nodes, with `node.rack_id` set to `rack_two`,
shards will relocate to even the number of shards across the nodes, but,
a shard and its replica will not be allocated in the same `rack_id`
value.
The awareness attributes can hold several values, for example:
-------------------------------------------------------------
cluster.routing.allocation.awareness.attributes: rack_id,zone
-------------------------------------------------------------
*NOTE*: When using awareness attributes, shards will not be allocated to
nodes that don't have values set for those attributes.
[float]
=== Forced Awareness
Sometimes, we know in advance the number of values an awareness
attribute can have, and more over, we would like never to have more
replicas then needed allocated on a specific group of nodes with the
same awareness attribute value. For that, we can force awareness on
specific attributes.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness one a node:
[source,js]
-------------------------------------------------------------------
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
cluster.routing.allocation.awareness.attributes: zone
-------------------------------------------------------------------
Now, lets say we start 2 nodes with `node.zone` set to `zone1` and
create an index with 5 shards and 1 replica. The index will be created,
but only 5 shards will be allocated (with no replicas). Only when we
start more shards with `node.zone` set to `zone2` will the replicas be
allocated.
[float]
==== Automatic Preference When Searching / GETing
When executing a search, or doing a get, the node receiving the request
will prefer to execute the request on shards that exists on nodes that
have the same attribute values as the executing node.
[float]
==== Realtime Settings Update
The settings can be updated using the <<cluster-update-settings,cluster update settings API>> on a live cluster.
[float]
=== Shard Allocation Filtering
Allow to control allocation if indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:
Lets say we have 4 nodes, each has specific attribute called `tag`
associated with it (the name of the attribute can be any name). Each
node has a specific value associated with `tag`. Node 1 has a setting
`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on.
We can create an index that will only deploy on nodes that have `tag`
set to `value1` and `value2` by setting
`index.routing.allocation.include.tag` to `value1,value2`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "value1,value2"
}'
--------------------------------------------------
On the other hand, we can create an index that will be deployed on all
nodes except for nodes with a `tag` of value `value3` by setting
`index.routing.allocation.exclude.tag` to `value3`. For example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.exclude.tag" : "value3"
}'
--------------------------------------------------
From version 0.90, `index.routing.allocation.require.*` can be used to
specify a number of rules, all of which MUST match in order for a shard
to be allocated to a node. This is in contrast to `include` which will
include a node if ANY rule matches.
The `include`, `exclude` and `require` values can have generic simple
matching wildcards, for example, `value1*`. A special attribute name
called `_ip` can be used to match on node ip values. In addition `_host`
attribute can be used to match on either the node's hostname or its ip
address.
Obviously a node can have several attributes associated with it, and
both the attribute name and value are controlled in the setting. For
example, here is a sample of several node configurations:
[source,js]
--------------------------------------------------
node.group1: group1_value1
node.group2: group2_value4
--------------------------------------------------
In the same manner, `include`, `exclude` and `require` can work against
several attributes, for example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.group1" : "xxx"
"index.routing.allocation.include.group2" : "yyy",
"index.routing.allocation.exclude.group3" : "zzz",
"index.routing.allocation.require.group4" : "aaa"
}'
--------------------------------------------------
The provided settings can also be updated in real time using the update
settings API, allowing to "move" indices (shards) around in realtime.
Cluster wide filtering can also be defined, and be updated in real time
using the cluster update settings API. This setting can come in handy
for things like decommissioning nodes (even if the replica count is set
to 0). Here is a sample of how to decommission a node based on `_ip`
address:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'
--------------------------------------------------

View file

@ -0,0 +1,26 @@
[[modules-discovery]]
== Discovery
The discovery module is responsible for discovering nodes within a
cluster, as well as electing a master node.
Note, ElasticSearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the master
node. The responsibility of the master node is to maintain the global
cluster state, and act if nodes join or leave the cluster by reassigning
shards. Each time a cluster state is changed, the state is made known to
the other nodes in the cluster (the manner depends on the actual
discovery implementation).
[float]
=== Settings
The `cluster.name` allows to create separated clusters from one another.
The default value for the cluster name is `elasticsearch`, though it is
recommended to change this to reflect the logical group name of the
cluster running.
include::discovery/ec2.asciidoc[]
include::discovery/zen.asciidoc[]

View file

@ -0,0 +1,82 @@
[[modules-discovery-ec2]]
=== EC2 Discovery
EC2 discovery allows to use the EC2 APIs to perform automatic discovery
(similar to multicast in non hostile multicast environments). Here is a
simple sample configuration:
[source,js]
--------------------------------------------------
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
discovery:
type: ec2
--------------------------------------------------
You'll need to install the `cloud-aws` plugin. Please check the
https://github.com/elasticsearch/elasticsearch-cloud-aws[plugin website]
to find the most up-to-date version to install before (re)starting
elasticsearch.
The following are a list of settings (prefixed with `discovery.ec2`)
that can further control the discovery:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`groups` |Either a comma separated list or array based list of
(security) groups. Only instances with the provided security groups will
be used in the cluster discovery.
|`host_type` |The type of host type to use to communicate with other
instances. Can be one of `private_ip`, `public_ip`, `private_dns`,
`public_dns`. Defaults to `private_ip`.
|`availability_zones` |Either a comma separated list or array based list
of availability zones. Only instances within the provided availability
zones will be used in the cluster discovery.
|`any_group` |If set to `false`, will require all security groups to be
present for the instance to be used for the discovery. Defaults to
`true`.
|`ping_timeout` |How long to wait for existing EC2 nodes to reply during
discovery. Defaults to 3s.
|=======================================================================
[float]
==== Filtering by Tags
EC2 discovery can also filter machines to include in the cluster based
on tags (and not just groups). The settings to use include the
`discovery.ec2.tag.` prefix. For example, setting
`discovery.ec2.tag.stage` to `dev` will only filter instances with a tag
key set to `stage`, and a value of `dev`. Several tags set will require
all of those tags to be set for the instance to be included.
One practical use for tag filtering is when an EC2 cluster contains many
nodes that are not running elasticsearch. In this case (particularly
with high `ping_timeout` values) there is a risk that a new node's
discovery phase will end before it has found the cluster (which will
result in it declaring itself master of a new cluster with the same name
- highly undesirable). Tagging elasticsearch EC2 nodes and then
filtering by that tag will resolve this issue.
[float]
==== Region
The `cloud.aws.region` can be set to a region and will automatically use
the relevant settings for both `ec2` and `s3`. The available values are:
`us-east-1`, `us-west-1`, `ap-southeast-1`, `eu-west-1`.
[float]
==== Automatic Node Attributes
Though not dependent on actually using `ec2` as discovery (but still
requires the cloud aws plugin installed), the plugin can automatically
add node attributes relating to EC2 (for example, availability zone,
that can be used with the awareness allocation feature). In order to
enable it, set `cloud.node.auto_attributes` to `true` in the settings.

View file

@ -0,0 +1,145 @@
[[modules-discovery-zen]]
=== Zen Discovery
The zen discovery is the built in discovery module for elasticsearch and
the default. It provides both multicast and unicast discovery as well
being easily extended to support cloud environments.
The zen discovery is integrated with other modules, for example, all
communication between nodes is done using the
<<modules-transport,transport>> module.
It is separated into several sub modules, which are explained below:
[float]
==== Ping
This is the process where a node uses the discovery mechanisms to find
other nodes. There is support for both multicast and unicast based
discovery (can be used in conjunction as well).
[float]
===== Multicast
Multicast ping discovery of other nodes is done by sending one or more
multicast requests where existing nodes that exists will receive and
respond to. It provides the following settings with the
`discovery.zen.ping.multicast` prefix:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`group` |The group address to use. Defaults to `224.2.2.4`.
|`port` |The port to use. Defaults to `54328`.
|`ttl` |The ttl of the multicast message. Defaults to `3`.
|`address` |The address to bind to, defaults to `null` which means it
will bind to all available network interfaces.
|=======================================================================
Multicast can be disabled by setting `multicast.enabled` to `false`.
[float]
===== Unicast
The unicast discovery allows to perform the discovery when multicast is
not enabled. It basically requires a list of hosts to use that will act
as gossip routers. It provides the following settings with the
`discovery.zen.ping.unicast` prefix:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`hosts` |Either an array setting or a comma delimited setting. Each
value is either in the form of `host:port`, or in the form of
`host[port1-port2]`.
|=======================================================================
The unicast discovery uses the
<<modules-transport,transport>> module to
perform the discovery.
[float]
==== Master Election
As part of the initial ping process a master of the cluster is either
elected or joined to. This is done automatically. The
`discovery.zen.ping_timeout` (which defaults to `3s`) allows to
configure the election to handle cases of slow or congested networks
(higher values assure less chance of failure). Note, this setting was
changed from 0.15.1 onwards, prior it was called
`discovery.zen.initial_ping_timeout`.
Nodes can be excluded from becoming a master by setting `node.master` to
`false`. Note, once a node is a client node (`node.client` set to
`true`), it will not be allowed to become a master (`node.master` is
automatically set to `false`).
The `discovery.zen.minimum_master_nodes` allows to control the minimum
number of master eligible nodes a node should "see" in order to operate
within the cluster. Its recommended to set it to a higher value than 1
when running more than 2 nodes in the cluster.
[float]
==== Fault Detection
There are two fault detection processes running. The first is by the
master, to ping all the other nodes in the cluster and verify that they
are alive. And on the other end, each node pings to master to verify if
its still alive or an election process needs to be initiated.
The following settings control the fault detection process using the
`discovery.zen.fd` prefix:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`ping_interval` |How often a node gets pinged. Defaults to `1s`.
|`ping_timeout` |How long to wait for a ping response, defaults to
`30s`.
|`ping_retries` |How many ping failures / timeouts cause a node to be
considered failed. Defaults to `3`.
|=======================================================================
[float]
==== External Multicast
The multicast discovery also supports external multicast requests to
discover nodes. The external client can send a request to the multicast
IP/group and port, in the form of:
[source,js]
--------------------------------------------------
{
"request" : {
"cluster_name": "test_cluster"
}
}
--------------------------------------------------
And the response will be similar to node info response (with node level
information only, including transport/http addresses, and node
attributes):
[source,js]
--------------------------------------------------
{
"response" : {
"cluster_name" : "test_cluster",
"transport_address" : "...",
"http_address" : "...",
"attributes" : {
"..."
}
}
}
--------------------------------------------------
Note, it can still be enabled, with disabled internal multicast
discovery, but still have external discovery working by keeping
`discovery.zen.ping.multicast.enabled` set to `true` (the default), but,
setting `discovery.zen.ping.multicast.ping.enabled` to `false`.

View file

@ -0,0 +1,74 @@
[[modules-gateway]]
== Gateway
The gateway module allows one to store the state of the cluster meta
data across full cluster restarts. The cluster meta data mainly holds
all the indices created with their respective (index level) settings and
explicit type mappings.
Each time the cluster meta data changes (for example, when an index is
added or deleted), those changes will be persisted using the gateway.
When the cluster first starts up, the state will be read from the
gateway and applied.
The gateway set on the node level will automatically control the index
gateway that will be used. For example, if the `fs` gateway is used,
then automatically, each index created on the node will also use its own
respective index level `fs` gateway. In this case, if an index should
not persist its state, it should be explicitly set to `none` (which is
the only other value it can be set to).
The default gateway used is the
<<modules-gateway-local,local>> gateway.
[float]
=== Recovery After Nodes / Time
In many cases, the actual cluster meta data should only be recovered
after specific nodes have started in the cluster, or a timeout has
passed. This is handy when restarting the cluster, and each node local
index storage still exists to be reused and not recovered from the
gateway (which reduces the time it takes to recover from the gateway).
The `gateway.recover_after_nodes` setting (which accepts a number)
controls after how many data and master eligible nodes within the
cluster recovery will start. The `gateway.recover_after_data_nodes` and
`gateway.recover_after_master_nodes` setting work in a similar fashion,
except they consider only the number of data nodes and only the number
of master nodes respectively. The `gateway.recover_after_time` setting
(which accepts a time value) sets the time to wait till recovery happens
once all `gateway.recover_after...nodes` conditions are met.
The `gateway.expected_nodes` allows to set how many data and master
eligible nodes are expected to be in the cluster, and once met, the
`recover_after_time` is ignored and recovery starts. The
`gateway.expected_data_nodes` and `gateway.expected_master_nodes`
settings are also supported. For example setting:
[source,js]
--------------------------------------------------
gateway:
recover_after_nodes: 1
recover_after_time: 5m
expected_nodes: 2
--------------------------------------------------
In an expected 2 nodes cluster will cause recovery to start 5 minutes
after the first node is up, but once there are 2 nodes in the cluster,
recovery will begin immediately (without waiting).
Note, once the meta data has been recovered from the gateway (which
indices to create, mappings and so on), then this setting is no longer
effective until the next full restart of the cluster.
Operations are blocked while the cluster meta data has not been
recovered in order not to mix with the actual cluster meta data that
will be recovered once the settings has been reached.
include::gateway/local.asciidoc[]
include::gateway/fs.asciidoc[]
include::gateway/hadoop.asciidoc[]
include::gateway/s3.asciidoc[]

View file

@ -0,0 +1,39 @@
[[modules-gateway-fs]]
=== Shared FS Gateway
*The shared FS gateway is deprecated and will be removed in a future
version. Please use the
<<modules-gateway-local,local gateway>>
instead.*
The file system based gateway stores the cluster meta data and indices
in a *shared* file system. Note, since it is a distributed system, the
file system should be shared between all different nodes. Here is an
example config to enable it:
[source,js]
--------------------------------------------------
gateway:
type: fs
--------------------------------------------------
[float]
==== location
The location where the gateway stores the cluster state can be set using
the `gateway.fs.location` setting. By default, it will be stored under
the `work` directory. Note, the `work` directory is considered a
temporal directory with ElasticSearch (meaning it is safe to `rm -rf`
it), the default location of the persistent gateway in work intentional,
*it should be changed*.
When explicitly specifying the `gateway.fs.location`, each node will
append its `cluster.name` to the provided location. It means that the
location provided can safely support several clusters.
[float]
==== concurrent_streams
The `gateway.fs.concurrent_streams` allow to throttle the number of
streams (per node) opened against the shared gateway performing the
snapshot operation. It defaults to `5`.

View file

@ -0,0 +1,36 @@
[[modules-gateway-hadoop]]
=== Hadoop Gateway
*The hadoop gateway is deprecated and will be removed in a future
version. Please use the
<<modules-gateway-local,local gateway>>
instead.*
The hadoop (HDFS) based gateway stores the cluster meta and indices data
in hadoop. Hadoop support is provided as a plugin and installing is
explained https://github.com/elasticsearch/elasticsearch-hadoop[here] or
downloading the hadoop plugin and placing it under the `plugins`
directory. Here is an example config to enable it:
[source,js]
--------------------------------------------------
gateway:
type: hdfs
hdfs:
uri: hdfs://myhost:8022
--------------------------------------------------
[float]
==== Settings
The hadoop gateway requires two simple settings. The `gateway.hdfs.uri`
controls the URI to connect to the hadoop cluster, for example:
`hdfs://myhost:8022`. The `gateway.hdfs.path` controls the path under
which the gateway will store the data.
[float]
==== concurrent_streams
The `gateway.hdfs.concurrent_streams` allow to throttle the number of
streams (per node) opened against the shared gateway performing the
snapshot operation. It defaults to `5`.

View file

@ -0,0 +1,31 @@
[[modules-gateway-local]]
=== Local Gateway
The local gateway allows for recovery of the full cluster state and
indices from the local storage of each node, and does not require a
common node level shared storage.
Note, different from shared gateway types, the persistency to the local
gateway is *not* done in an async manner. Once an operation is
performed, the data is there for the local gateway to recover it in case
of full cluster failure.
It is important to configure the `gateway.recover_after_nodes` setting
to include most of the expected nodes to be started after a full cluster
restart. This will insure that the latest cluster state is recovered.
For example:
[source,js]
--------------------------------------------------
gateway:
recover_after_nodes: 1
recover_after_time: 5m
expected_nodes: 2
--------------------------------------------------
Note, to backup/snapshot the full cluster state it is recommended that
the local storage for all nodes be copied (in theory not all are
required, just enough to guarantee a copy of each shard has been copied,
i.e. depending on the replication settings) while disabling flush.
Shared storage such as S3 can be used to keep the different nodes'
copies in one place, though it does comes at a price of more IO.

View file

@ -0,0 +1,51 @@
[[modules-gateway-s3]]
=== S3 Gateway
*The S3 gateway is deprecated and will be removed in a future version.
Please use the <<modules-gateway-local,local
gateway>> instead.*
S3 based gateway allows to do long term reliable async persistency of
the cluster state and indices directly to Amazon S3. Here is how it can
be configured:
[source,js]
--------------------------------------------------
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
gateway:
type: s3
s3:
bucket: bucket_name
--------------------------------------------------
You’ll need to install the `cloud-aws` plugin, by running
`bin/plugin install cloud-aws` before (re)starting elasticsearch.
The following are a list of settings (prefixed with `gateway.s3`) that
can further control the S3 gateway:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`chunk_size` |Big files are broken down into chunks (to overcome AWS 5g
limit and use concurrent snapshotting). Default set to `100m`.
|=======================================================================
[float]
==== concurrent_streams
The `gateway.s3.concurrent_streams` allow to throttle the number of
streams (per node) opened against the shared gateway performing the
snapshot operation. It defaults to `5`.
[float]
==== Region
The `cloud.aws.region` can be set to a region and will automatically use
the relevant settings for both `ec2` and `s3`. The available values are:
`us-east-1`, `us-west-1`, `ap-southeast-1`, `eu-west-1`.

View file

@ -0,0 +1,51 @@
[[modules-http]]
== HTTP
The http module allows to expose *elasticsearch* APIs
over HTTP.
The http mechanism is completely asynchronous in nature, meaning that
there is no blocking thread waiting for a response. The benefit of using
asynchronous communication for HTTP is solving the
http://en.wikipedia.org/wiki/C10k_problem[C10k problem].
When possible, consider using
http://en.wikipedia.org/wiki/Keepalive#HTTP_Keepalive[HTTP keep alive]
when connecting for better performance and try to get your favorite
client not to do
http://en.wikipedia.org/wiki/Chunked_transfer_encoding[HTTP chunking].
[float]
=== Settings
The following are the settings the can be configured for HTTP:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`http.port` |A bind port range. Defaults to `9200-9300`.
|`http.max_content_length` |The max content of an HTTP request. Defaults
to `100mb`
|`http.max_initial_line_length` |The max length of an HTTP URL. Defaults
to `4kb`
|`http.compression` |Support for compression when possible (with
Accept-Encoding). Defaults to `false`.
|`http.compression_level` |Defines the compression level to use.
Defaults to `6`.
|=======================================================================
It also shares the uses the common
<<modules-network,network settings>>.
[float]
=== Disable HTTP
The http module can be completely disabled and not started by setting
`http.enabled` to `false`. This make sense when creating non
<<modules-node,data nodes>> which accept HTTP
requests, and communicate with data nodes using the internal
<<modules-transport,transport>>.

View file

@ -0,0 +1,75 @@
[[modules-indices]]
== Indices
The indices module allow to control settings that are globally managed
for all indices.
[float]
=== Indexing Buffer
The indexing buffer setting allows to control how much memory will be
allocated for the indexing process. It is a global setting that bubbles
down to all the different shards allocated on a specific node.
The `indices.memory.index_buffer_size` accepts either a percentage or a
byte size value. It defaults to `10%`, meaning that `10%` of the total
memory allocated to a node will be used as the indexing buffer size.
This amount is then divided between all the different shards. Also, if
percentage is used, allow to set `min_index_buffer_size` (defaults to
`48mb`) and `max_index_buffer_size` which by default is unbounded.
The `indices.memory.min_shard_index_buffer_size` allows to set a hard
lower limit for the memory allocated per shard for its own indexing
buffer. It defaults to `4mb`.
[float]
=== TTL interval
You can dynamically set the `indices.ttl.interval` allows to set how
often expired documents will be automatically deleted. The default value
is 60s.
The deletion orders are processed by bulk. You can set
`indices.ttl.bulk_size` to fit your needs. The default value is 10000.
See also <<mapping-ttl-field>>.
[float]
=== Recovery
The following settings can be set to manage recovery policy:
[horizontal]
`indices.recovery.concurrent_streams`::
defaults to `3`.
`indices.recovery.file_chunk_size`::
defaults to `512kb`.
`indices.recovery.translog_ops`::
defaults to `1000`.
`indices.recovery.translog_size`::
defaults to `512kb`.
`indices.recovery.compress`::
defaults to `true`.
`indices.recovery.max_bytes_per_sec`::
since 0.90.1, defaults to `20mb`.
`indices.recovery.max_size_per_sec`::
deprecated from 0.90.1. Replaced by `indices.recovery.max_bytes_per_sec`.
[float]
=== Store level throttling
The following settings can be set to control store throttling:
[horizontal]
`indices.store.throttle.type`::
could be `merge` (default), `not` or `all`. See <<index-modules-store>>.
`indices.store.throttle.max_bytes_per_sec`::
defaults to `20mb`.

View file

@ -0,0 +1,34 @@
[[modules-jmx]]
== JMX
[float]
=== REMOVED AS OF v0.90
Use the stats APIs instead.
The JMX module exposes node information through
http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/[JMX].
JMX can be used by either
http://en.wikipedia.org/wiki/JConsole[jconsole] or
http://en.wikipedia.org/wiki/VisualVM[VisualVM].
Exposed JMX data include both node level information, as well as
instantiated index and shard on specific node. This is a work in
progress with each version exposing more information.
[float]
=== jmx.domain
The domain under which the JMX will register under can be set using
`jmx.domain` setting. It defaults to `{elasticsearch}`.
[float]
=== jmx.create_connector
An RMI connector can be started to accept JMX requests. This can be
enabled by setting `jmx.create_connector` to `true`. An RMI connector
does come with its own overhead, make sure you really need it.
When an RMI connector is created, the `jmx.port` setting provides a port
range setting for the ports the rmi connector can open on. By default,
it is set to `9400-9500`.

View file

@ -0,0 +1,69 @@
[[modules-memcached]]
== memcached
The memcached module allows to expose *elasticsearch*
APIs over the memcached protocol (as closely
as possible).
It is provided as a plugin called `transport-memcached` and installing
is explained
https://github.com/elasticsearch/elasticsearch-transport-memcached[here]
. Another option is to download the memcached plugin and placing it
under the `plugins` directory.
The memcached protocol supports both the binary and the text protocol,
automatically detecting the correct one to use.
[float]
=== Mapping REST to Memcached Protocol
Memcached commands are mapped to REST and handled by the same generic
REST layer in elasticsearch. Here is a list of the memcached commands
supported:
[float]
==== GET
The memcached `GET` command maps to a REST `GET`. The key used is the
URI (with parameters). The main downside is the fact that the memcached
`GET` does not allow body in the request (and `SET` does not allow to
return a result...). For this reason, most REST APIs (like search) allow
to accept the "source" as a URI parameter as well.
[float]
==== SET
The memcached `SET` command maps to a REST `POST`. The key used is the
URI (with parameters), and the body maps to the REST body.
[float]
==== DELETE
The memcached `DELETE` command maps to a REST `DELETE`. The key used is
the URI (with parameters).
[float]
==== QUIT
The memcached `QUIT` command is supported and disconnects the client.
[float]
=== Settings
The following are the settings the can be configured for memcached:
[cols="<,<",options="header",]
|===============================================================
|Setting |Description
|`memcached.port` |A bind port range. Defaults to `11211-11311`.
|===============================================================
It also shares the uses the common
<<modules-network,network settings>>.
[float]
=== Disable memcached
The memcached module can be completely disabled and not started using by
setting `memcached.enabled` to `false`. By default it is enabled once it
is detected as a plugin.

View file

@ -0,0 +1,88 @@
[[modules-network]]
== Network Settings
There are several modules within a Node that use network based
configuration, for example, the
<<modules-transport,transport>> and
<<modules-http,http>> modules. Node level
network settings allows to set common settings that will be shared among
all network based modules (unless explicitly overridden in each module).
The `network.bind_host` setting allows to control the host different
network components will bind on. By default, the bind host will be
`anyLocalAddress` (typically `0.0.0.0` or `::0`).
The `network.publish_host` setting allows to control the host the node
will publish itself within the cluster so other nodes will be able to
connect to it. Of course, this can't be the `anyLocalAddress`, and by
default, it will be the first non loopback address (if possible), or the
local address.
The `network.host` setting is a simple setting to automatically set both
`network.bind_host` and `network.publish_host` to the same host value.
Both settings allows to be configured with either explicit host address
or host name. The settings also accept logical setting values explained
in the following table:
[cols="<,<",options="header",]
|=======================================================================
|Logical Host Setting Value |Description
|`_local_` |Will be resolved to the local ip address.
|`_non_loopback_` |The first non loopback address.
|`_non_loopback:ipv4_` |The first non loopback IPv4 address.
|`_non_loopback:ipv6_` |The first non loopback IPv6 address.
|`_[networkInterface]_` |Resolves to the ip address of the provided
network interface. For example `_en0_`.
|`_[networkInterface]:ipv4_` |Resolves to the ipv4 address of the
provided network interface. For example `_en0:ipv4_`.
|`_[networkInterface]:ipv6_` |Resolves to the ipv6 address of the
provided network interface. For example `_en0:ipv6_`.
|=======================================================================
When the `cloud-aws` plugin is installed, the following are also allowed
as valid network host settings:
[cols="<,<",options="header",]
|==================================================================
|EC2 Host Value |Description
|`_ec2:privateIpv4_` |The private IP address (ipv4) of the machine.
|`_ec2:privateDns_` |The private host of the machine.
|`_ec2:publicIpv4_` |The public IP address (ipv4) of the machine.
|`_ec2:publicDns_` |The public host of the machine.
|`_ec2_` |Less verbose option for the private ip address.
|`_ec2:privateIp_` |Less verbose option for the private ip address.
|`_ec2:publicIp_` |Less verbose option for the public ip address.
|==================================================================
[float]
=== TCP Settings
Any component that uses TCP (like the HTTP, Transport and Memcached)
share the following allowed settings:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`network.tcp.no_delay` |Enable or disable tcp no delay setting.
Defaults to `true`.
|`network.tcp.keep_alive` |Enable or disable tcp keep alive. By default
not explicitly set.
|`network.tcp.reuse_address` |Should an address be reused or not.
Defaults to `true` on none windows machines.
|`network.tcp.send_buffer_size` |The size of the tcp send buffer size
(in size setting format). By default not explicitly set.
|`network.tcp.receive_buffer_size` |The size of the tcp receive buffer
size (in size setting format). By default not explicitly set.
|=======================================================================

View file

@ -0,0 +1,32 @@
[[modules-node]]
== Node
*elasticsearch* allows to configure a node to either be allowed to store
data locally or not. Storing data locally basically means that shards of
different indices are allowed to be allocated on that node. By default,
each node is considered to be a data node, and it can be turned off by
setting `node.data` to `false`.
This is a powerful setting allowing to simply create smart load
balancers that take part in some of different API processing. Lets take
an example:
We can start a whole cluster of data nodes which do not even start an
HTTP transport by setting `http.enabled` to `false`. Such nodes will
communicate with one another using the
<<modules-transport,transport>> module. In front
of the cluster we can start one or more "non data" nodes which will
start with HTTP enabled. All HTTP communication will be performed
through these "non data" nodes.
The benefit of using that is first the ability to create smart load
balancers. These "non data" nodes are still part of the cluster, and
they redirect operations exactly to the node that holds the relevant
data. The other benefit is the fact that for scatter / gather based
operations (such as search), these nodes will take part of the
processing since they will start the scatter process, and perform the
actual gather processing.
This relieves the data nodes to do the heavy duty of indexing and
searching, without needing to process HTTP requests (parsing), overload
the network, or perform the gather processing.

View file

@ -0,0 +1,245 @@
[[modules-plugins]]
== Plugins
[float]
=== Plugins
Plugins are a way to enhance the basic elasticsearch functionality in a
custom manner. They range from adding custom mapping types, custom
analyzers (in a more built in fashion), native scripts, custom discovery
and more.
[float]
==== Installing plugins
Installing plugins can either be done manually by placing them under the
`plugins` directory, or using the `plugin` script. Several plugins can
be found under the https://github.com/elasticsearch[elasticsearch]
organization in GitHub, starting with `elasticsearch-`.
Starting from 0.90.2, installing plugins typically take the form of
`plugin --install <org>/<user/component>/<version>`. The plugins will be
automatically downloaded in this case from `download.elasticsearch.org`,
and in case they don't exist there, from maven (central and sonatype).
Note that when the plugin is located in maven central or sonatype
repository, `<org>` is the artifact `groupId` and `<user/component>` is
the `artifactId`.
For prior version, the older form is
`plugin -install <org>/<user/component>/<version>`
A plugin can also be installed directly by specifying the URL for it,
for example:
`bin/plugin --url file://path/to/plugin --install plugin-name` or
`bin/plugin -url file://path/to/plugin -install plugin-name` for older
version.
Starting from 0.90.2, for more information about plugins, you can run
`bin/plugin -h`.
[float]
==== Site Plugins
Plugins can have "sites" in them, any plugin that exists under the
`plugins` directory with a `_site` directory, its content will be
statically served when hitting `/_plugin/[plugin_name]/` url. Those can
be added even after the process has started.
Installed plugins that do not contain any java related content, will
automatically be detected as site plugins, and their content will be
moved under `_site`.
The ability to install plugins from Github allows to easily install site
plugins hosted there by downloading the actual repo, for example,
running:
[source,js]
--------------------------------------------------
# From 0.90.2
bin/plugin --install mobz/elasticsearch-head
bin/plugin --install lukas-vlcek/bigdesk
# From a prior version
bin/plugin -install mobz/elasticsearch-head
bin/plugin -install lukas-vlcek/bigdesk
--------------------------------------------------
Will install both of those site plugins, with `elasticsearch-head`
available under `http://localhost:9200/_plugin/head/` and `bigdesk`
available under `http://localhost:9200/_plugin/bigdesk/`.
[float]
==== Mandatory Plugins
If you rely on some plugins, you can define mandatory plugins using the
`plugin.mandatory` attribute, for example, here is a sample config:
[source,js]
--------------------------------------------------
plugin.mandatory: mapper-attachments,lang-groovy
--------------------------------------------------
For safety reasons, if a mandatory plugin is not installed, the node
will not start.
[float]
==== Installed Plugins
A list of the currently loaded plugins can be retrieved using the
<<cluster-nodes-info,Node Info API>>.
[float]
=== Known Plugins
[float]
==== Analysis Plugins
* https://github.com/yakaz/elasticsearch-analysis-combo/[Combo Analysis
Plugin] (by Olivier Favre, Yakaz)
* https://github.com/elasticsearch/elasticsearch-analysis-smartcn[Smart
Chinese Analysis Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-analysis-icu[ICU
Analysis plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-analysis-stempel[Stempel
(Polish) Analysis plugin] (by elasticsearch team)
* https://github.com/chytreg/elasticsearch-analysis-morfologik[Morfologik
(Polish) Analysis plugin] (by chytreg)
* https://github.com/medcl/elasticsearch-analysis-ik[IK Analysis Plugin]
(by Medcl)
* https://github.com/medcl/elasticsearch-analysis-mmseg[Mmseg Analysis
Plugin] (by Medcl)
* https://github.com/jprante/elasticsearch-analysis-hunspell[Hunspell
Analysis Plugin] (by Jörg Prante)
* https://github.com/elasticsearch/elasticsearch-analysis-kuromoji[Japanese
(Kuromoji) Analysis plugin] (by elasticsearch team).
* https://github.com/suguru/elasticsearch-analysis-japanese[Japanese
Analysis plugin] (by suguru).
* https://github.com/imotov/elasticsearch-analysis-morphology[Russian
and English Morphological Analysis Plugin] (by Igor Motov)
* https://github.com/medcl/elasticsearch-analysis-pinyin[Pinyin Analysis
Plugin] (by Medcl)
* https://github.com/medcl/elasticsearch-analysis-string2int[String2Integer
Analysis Plugin] (by Medcl)
* https://github.com/barminator/elasticsearch-analysis-annotation[Annotation
Analysis Plugin] (by Michal Samek)
[float]
==== River Plugins
* https://github.com/elasticsearch/elasticsearch-river-couchdb[CouchDB
River Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-river-wikipedia[Wikipedia
River Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-river-twitter[Twitter
River Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-river-rabbitmq[RabbitMQ
River Plugin] (by elasticsearch team)
* https://github.com/domdorn/elasticsearch-river-activemq/[ActiveMQ
River Plugin] (by Dominik Dorn)
* https://github.com/albogdano/elasticsearch-river-amazonsqs[Amazon SQS
River Plugin] (by Alex Bogdanovski)
* https://github.com/xxBedy/elasticsearch-river-csv[CSV River Plugin]
(by Martin Bednar)
* http://www.pilato.fr/dropbox/[Dropbox River Plugin] (by David Pilato)
* http://www.pilato.fr/fsriver/[FileSystem River Plugin] (by David
Pilato)
* https://github.com/sksamuel/elasticsearch-river-hazelcast[Hazelcast
River Plugin] (by Steve Samuel)
* https://github.com/jprante/elasticsearch-river-jdbc[JDBC River Plugin]
(by Jörg Prante)
* https://github.com/qotho/elasticsearch-river-jms[JMS River Plugin] (by
Steve Sarandos)
* https://github.com/tlrx/elasticsearch-river-ldap[LDAP River Plugin]
(by Tanguy Leroux)
* https://github.com/richardwilly98/elasticsearch-river-mongodb/[MongoDB
River Plugin] (by Richard Louapre)
* https://github.com/sksamuel/elasticsearch-river-neo4j[Neo4j River
Plugin] (by Steve Samuel)
* https://github.com/jprante/elasticsearch-river-oai/[Open Archives
Initiative (OAI) River Plugin] (by Jörg Prante)
* https://github.com/sksamuel/elasticsearch-river-redis[Redis River
Plugin] (by Steve Samuel)
* http://dadoonet.github.com/rssriver/[RSS River Plugin] (by David
Pilato)
* https://github.com/adamlofts/elasticsearch-river-sofa[Sofa River
Plugin] (by adamlofts)
* https://github.com/javanna/elasticsearch-river-solr/[Solr River
Plugin] (by Luca Cavanna)
* https://github.com/sunnygleason/elasticsearch-river-st9[St9 River
Plugin] (by Sunny Gleason)
* https://github.com/endgameinc/elasticsearch-river-kafka[Kafka River
Plugin] (by Endgame Inc.)
* https://github.com/obazoud/elasticsearch-river-git[Git River Plugin] (by Olivier Bazoud)
[float]
==== Transport Plugins
* https://github.com/elasticsearch/elasticsearch-transport-wares[Servlet
transport] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-transport-memcached[Memcached
transport plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-transport-thrift[Thrift
Transport] (by elasticsearch team)
* https://github.com/tlrx/transport-zeromq[ZeroMQ transport layer
plugin] (by Tanguy Leroux)
* https://github.com/sonian/elasticsearch-jetty[Jetty HTTP transport
plugin] (by Sonian Inc.)
[float]
==== Scripting Plugins
* https://github.com/elasticsearch/elasticsearch-lang-python[Python
language Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-lang-javascript[JavaScript
language Plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-lang-groovy[Groovy lang
Plugin] (by elasticsearch team)
* https://github.com/hiredman/elasticsearch-lang-clojure[Clojure
Language Plugin] (by Kevin Downey)
[float]
==== Site Plugins
* https://github.com/lukas-vlcek/bigdesk[BigDesk Plugin] (by Lukáš Vlček)
* https://github.com/mobz/elasticsearch-head[Elasticsearch Head Plugin]
(by Ben Birch)
* https://github.com/royrusso/elasticsearch-HQ[ElasticSearch HQ] (by Roy
Russo)
* https://github.com/karmi/elasticsearch-paramedic[Paramedic Plugin] (by
Karel Minařík)
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy
Plugin] (by Zachary Tong)
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor
Plugin] (by Zachary Tong)
* https://github.com/andrewvc/elastic-hammer[Hammer Plugin] (by Andrew
Cholakian)
[float]
==== Misc Plugins
* https://github.com/elasticsearch/elasticsearch-mapper-attachments[Mapper
Attachments Type plugin] (by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-hadoop[Hadoop Plugin]
(by elasticsearch team)
* https://github.com/elasticsearch/elasticsearch-cloud-aws[AWS Cloud
Plugin] (by elasticsearch team)
* https://github.com/mattweber/elasticsearch-mocksolrplugin[ElasticSearch
Mock Solr Plugin] (by Matt Weber)
* https://github.com/spinscale/elasticsearch-suggest-plugin[Suggester
Plugin] (by Alexander Reelsen)
* https://github.com/medcl/elasticsearch-partialupdate[ElasticSearch
PartialUpdate Plugin] (by Medcl)
* https://github.com/sonian/elasticsearch-zookeeper[ZooKeeper Discovery
Plugin] (by Sonian Inc.)
* https://github.com/derryx/elasticsearch-changes-plugin[ElasticSearch
Changes Plugin] (by Thomas Peuss)
* http://tlrx.github.com/elasticsearch-view-plugin[ElasticSearch View
Plugin] (by Tanguy Leroux)
* https://github.com/viniciusccarvalho/elasticsearch-newrelic[ElasticSearch
New Relic Plugin] (by Vinicius Carvalho)
* https://github.com/endgameinc/elasticsearch-term-plugin[Terms
Component Plugin] (by Endgame Inc.)
* https://github.com/carrot2/elasticsearch-carrot2[carrot2 Plugin]:
Results clustering with carrot2 (by Dawid Weiss)

View file

@ -0,0 +1,242 @@
[[modules-scripting]]
== Scripting
The scripting module allows to use scripts in order to evaluate custom
expressions. For example, scripts can be used to return "script fields"
as part of a search request, or can be used to evaluate a custom score
for a query and so on.
The scripting module uses by default http://mvel.codehaus.org/[mvel] as
the scripting language with some extensions. mvel is used since it is
extremely fast and very simple to use, and in most cases, simple
expressions are needed (for example, mathematical equations).
Additional `lang` plugins are provided to allow to execute scripts in
different languages. Currently supported plugins are `lang-javascript`
for JavaScript, `lang-groovy` for Groovy, and `lang-python` for Python.
All places where a `script` parameter can be used, a `lang` parameter
(on the same level) can be provided to define the language of the
script. The `lang` options are `mvel`, `js`, `groovy`, `python`, and
`native`.
[float]
=== Default Scripting Language
The default scripting language (assuming no `lang` parameter is
provided) is `mvel`. In order to change it set the `script.default_lang`
to the appropriate language.
[float]
=== Preloaded Scripts
Scripts can always be provided as part of the relevant API, but they can
also be preloaded by placing them under `config/scripts` and then
referencing them by the script name (instead of providing the full
script). This helps reduce the amount of data passed between the client
and the nodes.
The name of the script is derived from the hierarchy of directories it
exists under, and the file name without the lang extension. For example,
a script placed under `config/scripts/group1/group2/test.py` will be
named `group1_group2_test`.
[float]
=== Native (Java) Scripts
Even though `mvel` is pretty fast, allow to register native Java based
scripts for faster execution.
In order to allow for scripts, the `NativeScriptFactory` needs to be
implemented that constructs the script that will be executed. There are
two main types, one that extends `AbstractExecutableScript` and one that
extends `AbstractSearchScript` (probably the one most users will extend,
with additional helper classes in `AbstractLongSearchScript`,
`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
Registering them can either be done by settings, for example:
`script.native.my.type` set to `sample.MyNativeScriptFactory` will
register a script named `my`. Another option is in a plugin, access
`ScriptModule` and call `registerScript` on it.
Executing the script is done by specifying the `lang` as `native`, and
the name of the script as the `script`.
Note, the scripts need to be in the classpath of elasticsearch. One
simple way to do it is to create a directory under plugins (choose a
descriptive name), and place the jar / classes files there, they will be
automatically loaded.
[float]
=== Score
In all scripts that can be used in facets, allow to access the current
doc score using `doc.score`.
[float]
=== Document Fields
Most scripting revolve around the use of specific document fields data.
The `doc['field_name']` can be used to access specific field data within
a document (the document in question is usually derived by the context
the script is used). Document fields are very fast to access since they
end up being loaded into memory (all the relevant field values/tokens
are loaded to memory).
The following data can be extracted from a field:
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The native value of the field. For example,
if its a short type, it will be short.
|`doc['field_name'].values` |The native array values of the field. For
example, if its a short type, it will be short[]. Remember, a field can
have several values within a single doc. Returns an empty array if the
field has no values.
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].multiValued` |A boolean indicating that the field
has several values within the corpus.
|`doc['field_name'].lat` |The latitude of a geo point type.
|`doc['field_name'].lon` |The longitude of a geo point type.
|`doc['field_name'].lats` |The latitudes of a geo point type.
|`doc['field_name'].lons` |The longitudes of a geo point type.
|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in miles)
of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
miles) of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].geohashDistance(geohash)` |The distance (in miles)
of this geo point field from the provided geohash.
|`doc['field_name'].geohashDistanceInKm(geohash)` |The distance (in km)
of this geo point field from the provided geohash.
|=======================================================================
[float]
=== Stored Fields
Stored fields can also be accessed when executed a script. Note, they
are much slower to access compared with document fields, but are not
loaded into memory. They can be simply accessed using
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
[float]
=== Source Field
The source field can also be accessed when executing a script. The
source field is loaded per doc, parsed, and then provided to the script
for evaluation. The `_source` forms the context under which the source
field can be accessed, for example `_source.obj2.obj1.field3`.
[float]
=== mvel Built In Functions
There are several built in functions that can be used within scripts.
They include:
[cols="<,<",options="header",]
|=======================================================================
|Function |Description
|`time()` |The current time in milliseconds.
|`sin(a)` |Returns the trigonometric sine of an angle.
|`cos(a)` |Returns the trigonometric cosine of an angle.
|`tan(a)` |Returns the trigonometric tangent of an angle.
|`asin(a)` |Returns the arc sine of a value.
|`acos(a)` |Returns the arc cosine of a value.
|`atan(a)` |Returns the arc tangent of a value.
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
approximately equivalent angle measured in radians
|`toDegrees(angrad)` |Converts an angle measured in radians to an
approximately equivalent angle measured in degrees.
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|`log10(a)` |Returns the base 10 logarithm of a value.
|`sqrt(a)` |Returns the correctly rounded positive square root of a
value.
|`cbrt(a)` |Returns the cube root of a double value.
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
arguments as prescribed by the IEEE 754 standard.
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
that is greater than or equal to the argument and is equal to a
mathematical integer.
|`floor(a)` |Returns the largest (closest to positive infinity) value
that is less than or equal to the argument and is equal to a
mathematical integer.
|`rint(a)` |Returns the value that is closest in value to the argument
and is equal to a mathematical integer.
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|`pow(a, b)` |Returns the value of the first argument raised to the
power of the second argument.
|`round(a)` |Returns the closest _int_ to the argument.
|`random()` |Returns a random _double_ value.
|`abs(a)` |Returns the absolute value of a value.
|`max(a, b)` |Returns the greater of two values.
|`min(a, b)` |Returns the smaller of two values.
|`ulp(d)` |Returns the size of an ulp of the argument.
|`signum(d)` |Returns the signum function of the argument.
|`sinh(x)` |Returns the hyperbolic sine of a value.
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
or underflow.
|=======================================================================
[float]
=== Arithmetic precision in MVEL
When dividing two numbers using MVEL based scripts, the engine tries to
be smart and adheres to the default behaviour of java. This means if you
divide two integers (you might have configured the fields as integer in
the mapping), the result will also be an integer. This means, if a
calculation like `1/num` is happening in your scripts and `num` is an
integer with the value of `8`, the result is `0` even though you were
expecting it to be `0.125`. You may need to enforce precision by
explicitly using a double like `1.0/num` in order to get the expected
result.

View file

@ -0,0 +1,120 @@
[[modules-threadpool]]
== Thread Pool
A node holds several thread pools in order to improve how threads are
managed and memory consumption within a node. There are several thread
pools, but the important ones include:
[horizontal]
`index`::
For index/delete operations, defaults to `fixed` type since
`0.90.0`, size `# of available processors`. (previously type `cached`)
`search`::
For count/search operations, defaults to `fixed` type since
`0.90.0`, size `3x # of available processors`. (previously type
`cached`)
`get`::
For get operations, defaults to `fixed` type since `0.90.0`,
size `# of available processors`. (previously type `cached`)
`bulk`::
For bulk operations, defaults to `fixed` type since `0.90.0`,
size `# of available processors`. (previously type `cached`)
`warmer`::
For segment warm-up operations, defaults to `scaling` since
`0.90.0` with a `5m` keep-alive. (previously type `cached`)
`refresh`::
For refresh operations, defaults to `scaling` since
`0.90.0` with a `5m` keep-alive. (previously type `cached`)
Changing a specific thread pool can be done by setting its type and
specific type parameters, for example, changing the `index` thread pool
to `blocking` type:
[source,js]
--------------------------------------------------
threadpool:
index:
type: blocking
min: 1
size: 30
wait_time: 30s
--------------------------------------------------
NOTE: you can update threadpool settings live using
<<cluster-update-settings>>.
[float]
=== Thread pool types
The following are the types of thread pools that can be used and their
respective parameters:
[float]
==== `cache`
The `cache` thread pool is an unbounded thread pool that will spawn a
thread if there are pending requests. Here is an example of how to set
it:
[source,js]
--------------------------------------------------
threadpool:
index:
type: cached
--------------------------------------------------
[float]
==== `fixed`
The `fixed` thread pool holds a fixed size of threads to handle the
requests with a queue (optionally bounded) for pending requests that
have no threads to service them.
The `size` parameter controls the number of threads, and defaults to the
number of cores times 5.
The `queue_size` allows to control the size of the queue of pending
requests that have no threads to execute them. By default, it is set to
`-1` which means its unbounded. When a request comes in and the queue is
full, the `reject_policy` parameter can control how it will behave. The
default, `abort`, will simply fail the request. Setting it to `caller`
will cause the request to execute on an IO thread allowing to throttle
the execution on the networking layer.
[source,js]
--------------------------------------------------
threadpool:
index:
type: fixed
size: 30
queue_size: 1000
reject_policy: caller
--------------------------------------------------
[float]
==== `blocking`
The `blocking` pool allows to configure a `min` (defaults to `1`) and
`size` (defaults to the number of cores times 5) parameters for the
number of threads.
It also has a backlog queue with a default `queue_size` of `1000`. Once
the queue is full, it will wait for the provided `wait_time` (defaults
to `60s`) on the calling IO thread, and fail if it has not been
executed.
[source,js]
--------------------------------------------------
threadpool:
index:
type: blocking
min: 1
size: 30
wait_time: 30s
--------------------------------------------------

View file

@ -0,0 +1,25 @@
[[modules-thrift]]
== Thrift
The thrift transport module allows to expose the REST interface of
elasticsearch using thrift. Thrift should provide better performance
over http. Since thrift provides both the wire protocol and the
transport, it should make using it simpler (thought its lacking on
docs...).
Using thrift requires installing the `transport-thrift` plugin, located
https://github.com/elasticsearch/elasticsearch-transport-thrift[here].
The thrift
https://github.com/elasticsearch/elasticsearch-transport-thrift/blob/master/elasticsearch.thrift[schema]
can be used to generate thrift clients.
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`thrift.port` |The port to bind to. Defaults to 9500-9600
|`thrift.frame` |Defaults to `-1`, which means no framing. Set to a
higher value to specify the frame size (like `15mb`).
|=======================================================================

View file

@ -0,0 +1,43 @@
[[modules-transport]]
== Transport
The transport module is used for internal communication between nodes
within the cluster. Each call that goes from one node to the other uses
the transport module (for example, when an HTTP GET request is processed
by one node, and should actually be processed by another node that holds
the data).
The transport mechanism is completely asynchronous in nature, meaning
that there is no blocking thread waiting for a response. The benefit of
using asynchronous communication is first solving the
http://en.wikipedia.org/wiki/C10k_problem[C10k problem], as well as
being the idle solution for scatter (broadcast) / gather operations such
as search in ElasticSearch.
[float]
=== TCP Transport
The TCP transport is an implementation of the transport module using
TCP. It allows for the following settings:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`transport.tcp.port` |A bind port range. Defaults to `9300-9400`.
|`transport.tcp.connect_timeout` |The socket connect timeout setting (in
time setting format). Defaults to `2s`.
|`transport.tcp.compress` |Set to `true` to enable compression (LZF)
between all nodes. Defaults to `false`.
|=======================================================================
It also shares the uses the common
<<modules-network,network settings>>.
[float]
=== Local Transport
This is a handy transport to use when running integration tests within
the JVM. It is automatically enabled when using
`NodeBuilder#local(true)`.