27 KiB
General Architecture
REST and Transport Layers
In general, there are two types of network communication used in Elasticsearch:
- External clients interact with the cluster via the public REST API over HTTP connections, this is referred to as the "REST layer"
- Cluster nodes communicate internally using a binary message format over TCP connections, this is referred to as the "Transport layer"
Cross-cluster replication (CCR) and search (CCS) also use transport messaging for inter-cluster communication. More information on CCR/CCS can be found in the Distributed architecture guide
REST Layer
Handler registration
All REST handlers exposed by Elasticsearch are registered in ActionModule#initRestHandlers. This method registers all the REST actions with the RestController using #registerHandler(...). These registrations populate a map of routes to RestHandlers to allow routing of incoming HTTP requests to their respective handlers. There are many REST endpoints configured statically in ActionModule, and additional endpoints can be contributed by ActionPlugins by implementing the getRestHandlers method.
Typically, REST actions follow the class naming convention Rest*Action
, which makes them easier to find, but not always; the
#routes() implementation for each Rest*Action
can also be helpful in finding a particular REST action.
When a RestRequest is received, RestController#dispatchRequest uses the request path to identify the destination handler and calls
#handleRequest on it. BaseRestHandler is a common base class extended by most Rest*Action
implementations.
Handler invocation
The usual flow of a REST request being handled is as follows
- RestController#dispatchRequest inspects the RestRequest and matches it to a handler using its map of paths to handlers.
- BaseRestHandler#handleRequest performs some basic parameter validation.
- BaseRestHandler calls into BaseRestHandler#prepareRequest, which
Rest*Action
subclasses implement to define the behavior for a particular action. prepareRequest processes the request parameters to produce a RestChannelConsumer that is ready to execute the action and return the response on a RestChannel. BaseRestHandler
validates that the handler consumed all the request parameters, throwing an exception if any were left unconsumed.BaseRestHandler
then supplies the channel to the RestChannelConsumer to begin executing the action. Some handlers, such as the RestBulkAction, consume the request as a stream of chunks to allow incremental processing of large requests.- The response is written to the
RestChannel
, either as a single payload or a stream of chunks.
Request interceptor
The RestController accepts a RestInterceptor that can intercept RestRequests and add additional pre-handling. A single
RestServerActionPlugin can provide a RestInterceptor
implementation, through which all requests are passed. The
Security plugin uses this capability to register an interceptor to authorize access to endpoints
that require operator privileges, populate the audit logs and perform some additional authentication when required.
HTTP server infrastructure
HTTP traffic is handled by an implementation of a HttpServerTransport. The HttpServerTransport
is responsible for binding to a
port, handling REST client connections, parsing received requests into RestRequest instances and dispatching those
requests to a HttpServerTransport.Dispatcher. The RestController is an implementation of HttpServerTransport.Dispatcher
.
The HttpServerTransport
is pluggable. There is a single Netty-based implementation
of HttpServerTransport
, the Netty4HttpServerTransport, but some plugins, such as Security
, supply instances of it with
additional configuration to implement features like IP filtering or TLS (see Security#getHttpTransports).
Transport Layer
Rest*Action
implementations typically translate received requests into an ActionRequest which is dispatched via the NodeClient
passed in by the RestController. The NodeClient is the entrypoint into the "transport layer" over which internal cluster actions
are coordinated.
Note
Rest*Action
classes usually have a correspondingTransport*Action
, this naming convention makes it easy to locate the corresponding RestHandler for a TransportAction. (e.g.RestGetAction
callsTransportGetAction
). There are actions for which this pattern does not hold, in those cases you can locate the transport action for a REST action by looking at theNodeClient
invocation in theRest*Action
'sprepareRequest
implementation, it should specify theActionType
being invoked which can then be used to locate theTransport*Action
class that handles it.
Action registration
Elasticsearch contains many TransportActions, configured statically in ActionModule#setupActions. ActionPlugins can
contribute additional actions via the getActions method. TransportAction
s define the request and response
types used to invoke the action and the logic for performing the action.
TransportAction
s that are registered in ActionModule#setupActions
(including those supplied by plugins) are locally bound to their
ActionType. This map of type -> action
bindings is what NodeClient instances use to locate actions in NodeClient#executeLocally.
The actions themselves sometimes dispatch downstream actions to other nodes in the cluster via the transport layer (see
TransportService#sendRequest). To be callable in this way, actions must register themselves with the TransportService by calling
TransportService#registerRequestHandler. HandledTransportAction is a common parent class that registers an action with the
TransportService
.
Note
The name TransportAction can be misleading, as it suggests they are all invoke-able and invoked via the TCP transport. In fact, a majority of transport actions are only ever invoked locally via the NodeClient. The two key features of a
TransportAction
are:
- Their constructor parameters are provided via dependency injection (Guice) at runtime rather than direct instantiation.
- They represent a security boundary; we check that the calling user is authorized to call the action they're calling using TransportInterceptors, which are described below.
Action invocation
The NodeClient executes all actions locally on the invoking node using the NodeClient#executeLocally method. This method invokes TaskManager#registerAndExecute to register a task, execute the action, then unregister the task once the action completes. There is more information about task management in the Distributed architecture guide
There are a few common patterns for TransportAction execution that are present in the codebase. Some prominent examples include...
- TransportMasterNodeAction: Executes an action on the master node. Typically used to perform cluster state updates, as these can only be performed on the master. The base class contains logic for locating the master node and delegating to it to execute the specified logic.
- TransportNodesAction: Executes an action on many nodes then collates the responses.
- TransportLocalClusterStateAction: Waits for a cluster state that optionally meets some criteria and performs a read action on it on the coordinating node.
- TransportReplicationAction: Execute an action on a primary shard followed by all replicas that exist for that shard. The base class implements logic for locating the primary and replica shards in the cluster and delegating to the relevant nodes. Often used for index updates in stateful Elasticsearch.
- TransportSingleShardAction: Executes a read operation on a specific shard, the base class contains logic for locating an available copy of the nominated shard and delegating to the relevant node to execute the action. On a failure, the action is retried on a different copy.
Transport interceptors
The transport action infrastructure allows the configuration of interceptors which can implement cross-cutting concerns like security around action invocations. Implementations of TransportInterceptor interface are able to intercept action requests by wrapping TransportRequestHandlers, or by intercepting requests before they are sent. Plugins that implement the NetworkPlugin interface are able to register interceptors by implementing the getTransportInterceptors method.
Transport infrastructure
The transport infrastructure is pluggable and implementations can be provided by NetworkPlugin#getTransports. The role of the Transport is to establish connections between nodes over which TransportRequests can be sent, maintain a registry of TransportRequestHandlers for routing inbound requests and maintain state to correlate inbound responses with the original requests. There is a single Netty-based TCP transport used in production Elasticsearch, the Netty4Transport, but the security plugin extends that to add SSL and IP filtering capabilities.
Serializations
Settings
Elasticsearch supports cluster-level settings and index-level settings, configurable via node-level file settings
(e.g. elasticsearch.yml
file), command line arguments and REST APIs.
Declaring a Setting
The Setting class is the building block for Elasticsearch server settings. Each Setting
can take multiple Property
declarations to define setting characteristics. All setting values first come from the node-local elasticsearch.yml
file,
if they are set therein, before falling back to the default specified in their Setting
declaration. A setting with
Property.Dynamic
can be updated during runtime, but must be paired with a local volatile variable like this one and
registered in the ClusterSettings
via a utility like ClusterSettings#initializeAndWatch() to catch and immediately
apply dynamic changes. NB that a common dynamic Setting bug is always reading the value directly from Metadata#settings(),
which holds the default and dynamically updated values, but not the node-local elasticsearch.yml
value. The scope of a
Setting must also be declared, such as Property.IndexScope
for a setting that applies to indexes, or Property.NodeScope
for a cluster-level setting.
ClusterSettings tracks the core Elasticsearch settings. Ultimately the ClusterSettings
get loaded via the
SettingsModule. Additional settings from the various plugins are collected during node construction and passed into the
SettingsModule constructor. The Plugin interface has a getSettings() method via which each plugin can declare additional
settings.
Dynamically updating a Setting
Externally, TransportClusterUpdateSettingsAction and TransportUpdateSettingsAction (and the corresponding REST endpoints)
allow users to dynamically change cluster and index settings, respectively. Internally, AbstractScopedSettings
(parent class
of ClusterSettings
) has various helper methods to track dynamic changes: it keeps a registry of SettingUpdater
consumer
lambdas to run updates when settings are changed in the cluster state. The ClusterApplierService
sends setting updates
through to the AbstractScopedSettings
, invoking the consumers registered therein for each updated setting.
Index settings are always persisted. They can only be modified on an existing index, and setting values are persisted as part
of the IndexMetadata
. Cluster settings, however, can be either persisted or transient depending on how they are tied to
Metadata (applied here). Changes to persisted cluster settings will survive a full cluster restart; whereas changes
made to transient cluster settings will reset to their default values, or the elasticsearch.yml
values, if the cluster
state must ever be reloaded from persisted state.
Deprecations
Backwards Compatibility
major releases are mostly about breaking compatibility and dropping deprecated functionality.
Elasticsearch versions are composed of three pieces of information: the major version, the minor version, and the patch version, in that order (major.minor.patch). Patch releases are typically bug fixes; minor releases contain improvements / new features; and major releases essentially break compatibility and enable removal of deprecated functionality. As an example, each of 8.0.0, 8.3.0 and 8.3.1 specifies an exact release version. They all have the same major version (8) and the last two have the same minor version (8.3). Multiversion compatibility within a cluster, or backwards compatibility with older version nodes, is guaranteed across specific versions.
Transport Layer Backwards Compatibility
Elasticsearch nodes can communicate over the network with all node versions within the same major release. All versions within one major version X are also compatible with the last minor version releases of the previous major version, i.e. (X-1).last. More concretely, all 8.x.x version nodes can communicate with all 7.17.x version nodes.
Index Format Backwards Compatibility
Index data format backwards compatibility is guaranteed with all versions of the previous major release. All 8.x.x version nodes, for example, can read index data written by any 7.x.x version node. 9.x.x versions, however, will not be able to read 7.x.x format data files.
Elasticsearch does not have an upgrade process to convert from older to newer index data formats. The user is expected to run
reindex
on any remaining untouched data from a previous version upgrade before upgrading to the next version. There is a good
chance that older version index data will age out and be deleted before the user does the next upgrade, but reindex
can be used
if that is not the case.
Snapshot Backwards Compatibility
Snapshots taken by a cluster of version X cannot be read by a cluster running older version nodes. However, snapshots taken by an older version cluster can continue to be read from and written to by newer version clusters: this compatibility goes back many major versions. If a newer version cluster writes to a snapshot repository containing snapshots from an older version, then it will do so in a way that leaves the repository format (metadata and file layout) readable by those older versions.
Restoring indexes that have different and no longer supported data formats can be tricky: see the public snapshot compatibility docs for details.
Upgrade
See the public upgrade docs for the upgrade process.
Plugins
(what warrants a plugin?)
(what plugins do we have?)
Testing
(Overview of our testing frameworks. Discuss base test classes.)