mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
308 lines
13 KiB
Text
308 lines
13 KiB
Text
[[repository-azure]]
|
|
=== Azure repository
|
|
|
|
You can use
|
|
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction[Azure
|
|
Blob storage] as a repository for <<snapshot-restore,Snapshot and restore>>.
|
|
|
|
[[repository-azure-usage]]
|
|
==== Setup
|
|
|
|
To enable Azure repositories, first configure an Azure repository client by
|
|
specifying one or more settings of the form
|
|
`azure.client.CLIENT_NAME.SETTING_NAME`. By default, `azure` repositories use a
|
|
client named `default`, but you may specify a different client name when
|
|
registering each repository.
|
|
|
|
The only mandatory Azure repository client setting is `account`, which is a
|
|
{ref}/secure-settings.html[secure setting] defined in the <<secure-settings,{es}
|
|
keystore>>. To provide this setting, use the `elasticsearch-keystore` tool on
|
|
each node:
|
|
|
|
[source,sh]
|
|
----------------------------------------------------------------
|
|
bin/elasticsearch-keystore add azure.client.default.account
|
|
----------------------------------------------------------------
|
|
|
|
If you adjust this setting after a node has started, call the
|
|
<<cluster-nodes-reload-secure-settings,Nodes reload secure settings API>> to
|
|
reload the new value.
|
|
|
|
You may define more than one client by setting their `account` values. For
|
|
instance, to set the `default` client and another client called `secondary`, run
|
|
the following commands on each node:
|
|
|
|
[source,sh]
|
|
----------------------------------------------------------------
|
|
bin/elasticsearch-keystore add azure.client.default.account
|
|
bin/elasticsearch-keystore add azure.client.secondary.account
|
|
----------------------------------------------------------------
|
|
|
|
The `key` and `sas_token` settings are also secure settings and can be set using
|
|
commands like the following:
|
|
|
|
[source,sh]
|
|
----------------------------------------------------------------
|
|
bin/elasticsearch-keystore add azure.client.default.key
|
|
bin/elasticsearch-keystore add azure.client.secondary.sas_token
|
|
----------------------------------------------------------------
|
|
|
|
Other Azure repository client settings must be set in `elasticsearch.yml` before
|
|
the node starts. For example:
|
|
|
|
[source,yaml]
|
|
----
|
|
azure.client.default.timeout: 10s
|
|
azure.client.default.max_retries: 7
|
|
azure.client.default.endpoint_suffix: core.chinacloudapi.cn
|
|
azure.client.secondary.timeout: 30s
|
|
----
|
|
|
|
In this example, the client side timeout is `10s` per try for repositories which
|
|
use the `default` client, with `7` retries before failing and an endpoint
|
|
suffix of `core.chinacloudapi.cn`. Repositories which use the `secondary` client
|
|
will have a timeout of `30s` per try, but will use the default endpoint and will
|
|
fail after the default number of retries.
|
|
|
|
Once an Azure repository client is configured correctly, register an Azure
|
|
repository as follows, providing the client name using the `client`
|
|
<<repository-azure-repository-settings,repository setting>>:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _snapshot/my_backup
|
|
{
|
|
"type": "azure",
|
|
"settings": {
|
|
"client": "secondary"
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip:we don't have azure setup while testing this]
|
|
|
|
If you are using the `default` client, you may omit the `client` repository
|
|
setting:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _snapshot/my_backup
|
|
{
|
|
"type": "azure"
|
|
}
|
|
----
|
|
// TEST[skip:we don't have azure setup while testing this]
|
|
|
|
NOTE: In progress snapshot or restore jobs will not be preempted by a *reload*
|
|
of the storage secure settings. They will complete using the client as it was
|
|
built when the operation started.
|
|
|
|
[[repository-azure-client-settings]]
|
|
==== Client settings
|
|
|
|
The following list describes the available client settings. Those that must be
|
|
stored in the keystore are marked as ({ref}/secure-settings.html[Secure],
|
|
{ref}/secure-settings.html#reloadable-secure-settings[reloadable]); the other
|
|
settings must be stored in the `elasticsearch.yml` file. The default
|
|
`CLIENT_NAME` is `default` but you may configure a client with a different name
|
|
and specify that client by name when registering a repository.
|
|
|
|
`azure.client.CLIENT_NAME.account` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
|
|
The Azure account name, which is used by the repository's internal Azure
|
|
client. This setting is required for all clients.
|
|
|
|
`azure.client.CLIENT_NAME.endpoint_suffix`::
|
|
The Azure endpoint suffix to connect to. The default value is
|
|
`core.windows.net`.
|
|
|
|
`azure.client.CLIENT_NAME.key` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
|
|
The Azure secret key, which is used by the repository's internal Azure client.
|
|
Alternatively, use `sas_token`.
|
|
|
|
`azure.client.CLIENT_NAME.max_retries`::
|
|
The number of retries to use when an Azure request fails. This setting helps
|
|
control the exponential backoff policy. It specifies the number of retries
|
|
that must occur before the snapshot fails. The default value is `3`. The
|
|
initial backoff period is defined by Azure SDK as `30s`. Thus there is `30s`
|
|
of wait time before retrying after a first timeout or failure. The maximum
|
|
backoff period is defined by Azure SDK as `90s`.
|
|
|
|
`azure.client.CLIENT_NAME.proxy.host`::
|
|
The host name of a proxy to connect to Azure through. By default, no proxy is
|
|
used.
|
|
|
|
`azure.client.CLIENT_NAME.proxy.port`::
|
|
The port of a proxy to connect to Azure through. By default, no proxy is used.
|
|
|
|
`azure.client.CLIENT_NAME.proxy.type`::
|
|
Register a proxy type for the client. Supported values are `direct`, `http`,
|
|
and `socks`. For example: `azure.client.default.proxy.type: http`. When
|
|
`proxy.type` is set to `http` or `socks`, `proxy.host` and `proxy.port` must
|
|
also be provided. The default value is `direct`.
|
|
|
|
`azure.client.CLIENT_NAME.sas_token` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
|
|
A shared access signatures (SAS) token, which the repository's internal Azure
|
|
client uses for authentication. The SAS token must have read (r), write (w),
|
|
list (l), and delete (d) permissions for the repository base path and all its
|
|
contents. These permissions must be granted for the blob service (b) and apply
|
|
to resource types service (s), container (c), and object (o). Alternatively,
|
|
use `key`.
|
|
|
|
`azure.client.CLIENT_NAME.timeout`::
|
|
The client side timeout for any single request to Azure, as a
|
|
<<time-units,time unit>>. For example, a value of `5s` specifies a 5 second
|
|
timeout. There is no default value, which means that {es} uses the
|
|
https://azure.github.io/azure-storage-java/com/microsoft/azure/storage/RequestOptions.html#setTimeoutIntervalInMs(java.lang.Integer)[default
|
|
value] set by the Azure client.
|
|
|
|
`azure.client.CLIENT_NAME.endpoint`::
|
|
The Azure endpoint to connect to. It must include the protocol used to connect
|
|
to Azure.
|
|
|
|
`azure.client.CLIENT_NAME.secondary_endpoint`::
|
|
The Azure secondary endpoint to connect to. It must include the protocol used
|
|
to connect to Azure.
|
|
|
|
[[repository-azure-default-credentials]]
|
|
[NOTE]
|
|
.Obtaining credentials from the environment
|
|
======================================================
|
|
If you specify neither the `key` nor the `sas_token` settings for a client then
|
|
{es} will attempt to automatically obtain credentials from the environment in
|
|
which it is running using mechanisms built into the Azure SDK. This is ideal
|
|
for when running {es} on the Azure platform.
|
|
|
|
When running {es} on an
|
|
https://azure.microsoft.com/en-gb/products/virtual-machines[Azure Virtual
|
|
Machine], you should use
|
|
https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview[Azure
|
|
Managed Identity] to provide credentials to {es}. To use Azure Managed Identity,
|
|
assign a suitably authorized identity to the Azure Virtual Machine on which {es}
|
|
is running.
|
|
|
|
When running {es} in
|
|
https://azure.microsoft.com/en-gb/products/kubernetes-service[Azure Kubernetes
|
|
Service], for instance using {eck-ref}/k8s-snapshots.html#k8s-azure-workload-identity[{eck}], you should use
|
|
https://azure.github.io/azure-workload-identity/docs/introduction.html[Azure
|
|
Workload Identity] to provide credentials to {es}. To use Azure Workload
|
|
Identity, mount the `azure-identity-token` volume as a subdirectory of the
|
|
<<config-files-location,{es} config directory>> and set the
|
|
`AZURE_FEDERATED_TOKEN_FILE` environment variable to point to a file called
|
|
`azure-identity-token` within the mounted volume.
|
|
|
|
The Azure SDK has several other mechanisms to automatically obtain credentials
|
|
from its environment, but the two methods described above are the only ones
|
|
that are tested and supported for use in {es}.
|
|
======================================================
|
|
|
|
|
|
|
|
[[repository-azure-repository-settings]]
|
|
==== Repository settings
|
|
|
|
The Azure repository supports the following settings, which may be specified
|
|
when registering an Azure repository as follows:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _snapshot/my_backup
|
|
{
|
|
"type": "azure",
|
|
"settings": {
|
|
"client": "secondary",
|
|
"container": "my_container",
|
|
"base_path": "snapshots_prefix"
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip:we don't have azure setup while testing this]
|
|
|
|
`client`::
|
|
|
|
The name of the Azure repository client to use. Defaults to `default`.
|
|
|
|
`container`::
|
|
|
|
Container name. You must create the azure container before creating the repository.
|
|
Defaults to `elasticsearch-snapshots`.
|
|
|
|
`base_path`::
|
|
|
|
Specifies the path within container to repository data. Defaults to empty
|
|
(root directory).
|
|
+
|
|
NOTE: Don't set `base_path` when configuring a snapshot repository for {ECE}.
|
|
{ECE} automatically generates the `base_path` for each deployment so that
|
|
multiple deployments may share the same bucket.
|
|
|
|
`chunk_size`::
|
|
|
|
Big files can be broken down into multiple smaller blobs in the blob store
|
|
during snapshotting. It is not recommended to change this value from its
|
|
default unless there is an explicit reason for limiting the size of blobs in
|
|
the repository. Setting a value lower than the default can result in an
|
|
increased number of API calls to the Azure blob store during snapshot create
|
|
as well as restore operations compared to using the default value and thus
|
|
make both operations slower as well as more costly. Specify the chunk size
|
|
as a <<byte-units,byte unit>>, for example: `10MB`, `5KB`, `500B`. Defaults
|
|
to the maximum size of a blob in the Azure blob store which is `5TB`.
|
|
|
|
`compress`::
|
|
|
|
When set to `true` metadata files are stored in compressed format. This
|
|
setting doesn't affect index files that are already compressed by default.
|
|
Defaults to `true`.
|
|
|
|
include::repository-shared-settings.asciidoc[]
|
|
|
|
`location_mode`::
|
|
|
|
`primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it
|
|
to `secondary_only`, it will force `readonly` to true.
|
|
|
|
`delete_objects_max_size`::
|
|
|
|
(integer) Sets the maxmimum batch size, betewen 1 and 256, used for `BlobBatch` requests. Defaults to 256 which is the maximum
|
|
number supported by the https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks[Azure blob batch API].
|
|
|
|
`max_concurrent_batch_deletes`::
|
|
|
|
(integer) Sets the maximum number of concurrent batch delete requests that will be submitted for any individual bulk delete with `BlobBatch`. Note that the effective number of concurrent deletes is further limited by the Azure client connection and event loop thread limits. Defaults to 10, minimum is 1, maximum is 100.
|
|
|
|
[[repository-azure-validation]]
|
|
==== Repository validation rules
|
|
|
|
According to the
|
|
https://docs.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata[containers
|
|
naming guide], a container name must be a valid DNS name, conforming to the
|
|
following naming rules:
|
|
|
|
* Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
|
|
* Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not
|
|
permitted in container names.
|
|
* All letters in a container name must be lowercase.
|
|
* Container names must be from 3 through 63 characters long.
|
|
|
|
[IMPORTANT]
|
|
.Supported Azure Storage Account types
|
|
===============================================
|
|
The Azure repository type works with all Standard storage accounts
|
|
|
|
* Standard Locally Redundant Storage - `Standard_LRS`
|
|
* Standard Zone-Redundant Storage - `Standard_ZRS`
|
|
* Standard Geo-Redundant Storage - `Standard_GRS`
|
|
* Standard Read Access Geo-Redundant Storage - `Standard_RAGRS`
|
|
|
|
https://azure.microsoft.com/en-gb/documentation/articles/storage-premium-storage[Premium Locally Redundant Storage] (`Premium_LRS`) is **not supported** as it is only usable as VM disk storage, not as general storage.
|
|
===============================================
|
|
|
|
[[repository-azure-linearizable-registers]]
|
|
==== Linearizable register implementation
|
|
|
|
The linearizable register implementation for Azure repositories is based on
|
|
Azure's support for strongly consistent leases. Each lease may only be held by
|
|
a single node at any time. The node presents its lease when performing a read
|
|
or write operation on a protected blob. Lease-protected operations fail if the
|
|
lease is invalid or expired. To perform a compare-and-exchange operation on a
|
|
register, {es} first obtains a lease on the blob, then reads the blob contents
|
|
under the lease, and finally uploads the updated blob under the same lease.
|
|
This process ensures that the read and write operations happen atomically.
|