Add peer recovery planners that take into account available snapshots (#75840)

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates #73496
This commit is contained in:
Francisco Fernández Castaño 2021-08-09 14:03:12 +02:00 committed by GitHub
parent 997441c83a
commit 3c8b9a6f2e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
23 changed files with 1803 additions and 114 deletions

View file

@ -73,3 +73,18 @@ and may interfere with indexing, search, and other activities in your cluster.
Do not increase this setting without carefully verifying that your cluster has
the resources available to handle the extra load that will result.
`indices.recovery.use_snapshots`::
(<<cluster-update-settings,Dynamic>>, Expert) Enables snapshot-based peer recoveries.
+
{es} recovers replicas and relocates primary shards using the _peer recovery_
process, which involves constructing a new copy of a shard on the target node.
When `indices.recovery.use_snapshots` is `false` {es} will construct this new
copy by transferring the index data from the current primary. When this setting
is `true` {es} will attempt to copy the index data from a recent snapshot
first, and will only copy data from the primary if it cannot identify a
suitable snapshot.
+
Setting this option to `true` reduces your operating costs if your cluster runs
in an environment where the node-to-node data transfer costs are higher than
the costs of recovering data from a snapshot. It also reduces the amount of
work that the primary must do during a recovery.