mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 15:17:30 -04:00
[DOCS] CCR disaster recovery (#91491)
* Add bi-directional disaster recovery * add ccr bi-directional disaster recovery image * add link to bi-directional disaster recovery * add image * add [source] * fix language * Update bi-directional-disaster-recovery.asciidoc * Update bi-directional-disaster-recovery.asciidoc * Update bi-directional-disaster-recovery.asciidoc * Apply suggestions from code review Remove immutable restrictions and add update/delete by query instructions. * Apply suggestions from code review * Apply suggestions from code review Fixing reference * Apply suggestions from code review fix list format * Apply suggestions from code review remove space to fix format * add test * Update docs/reference/ccr/bi-directional-disaster-recovery.asciidoc * Add test * Add uni-directional DR doc * Add uni-directional image * add uni-directional doc reference * Update docs/reference/ccr/uni-directional-disaster-recovery.asciidoc Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> * Apply suggestions from code review Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> * Apply suggestions from code review Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> * Pushing up minor edits to restart build. Previous build failure 'Could not determine the dependencies of task ':x-pack:plugin:ml:explodedBundlePlugin' * Apply suggestions from code review * Tip formatting and renaming follwer index to _copy in uni-direction * Fix failing CI doc check --------- Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit is contained in:
parent
8e15a1a7ad
commit
8c06aea3dd
5 changed files with 471 additions and 0 deletions
275
docs/reference/ccr/bi-directional-disaster-recovery.asciidoc
Normal file
275
docs/reference/ccr/bi-directional-disaster-recovery.asciidoc
Normal file
|
@ -0,0 +1,275 @@
|
|||
[role="xpack"]
|
||||
[[ccr-disaster-recovery-bi-directional-tutorial]]
|
||||
=== Tutorial: Disaster recovery based on bi-directional {ccr}
|
||||
++++
|
||||
<titleabbrev>Bi-directional disaster recovery</titleabbrev>
|
||||
++++
|
||||
|
||||
////
|
||||
[source,console]
|
||||
----
|
||||
PUT _data_stream/logs-generic-default
|
||||
----
|
||||
// TESTSETUP
|
||||
|
||||
[source,console]
|
||||
----
|
||||
DELETE /_data_stream/*
|
||||
----
|
||||
// TEARDOWN
|
||||
////
|
||||
|
||||
Learn how to set up disaster recovery between two clusters based on
|
||||
bi-directional {ccr}. The following tutorial is designed for data streams which support
|
||||
<<update-docs-in-a-data-stream-by-query,update by query>> and <<delete-docs-in-a-data-stream-by-query,delete by query>>. You can only perform these actions on the leader index.
|
||||
|
||||
This tutorial works with {ls} as the source of ingestion. It takes advantage of a {ls} feature where {logstash-ref}/plugins-outputs-elasticsearch.html[the {ls} output to {es}] can be load balanced across an array of hosts specified. {beats} and {agents} currently do not
|
||||
support multiple outputs. It should also be possible to set up a proxy
|
||||
(load balancer) to redirect traffic without {ls} in this tutorial.
|
||||
|
||||
* Setting up a remote cluster on `clusterA` and `clusterB`.
|
||||
* Setting up bi-directional cross-cluster replication with exclusion patterns.
|
||||
* Setting up {ls} with multiple hosts to allow automatic load balancing and switching during disasters.
|
||||
|
||||
image::images/ccr-bi-directional-disaster-recovery.png[Bi-directional cross cluster replication failover and failback]
|
||||
|
||||
[[ccr-tutorial-initial-setup]]
|
||||
==== Initial setup
|
||||
. Set up a remote cluster on both clusters.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On cluster A ###
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"persistent": {
|
||||
"cluster": {
|
||||
"remote": {
|
||||
"clusterB": {
|
||||
"mode": "proxy",
|
||||
"skip_unavailable": true,
|
||||
"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
|
||||
"proxy_socket_connections": 18,
|
||||
"proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
### On cluster B ###
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"persistent": {
|
||||
"cluster": {
|
||||
"remote": {
|
||||
"clusterA": {
|
||||
"mode": "proxy",
|
||||
"skip_unavailable": true,
|
||||
"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
|
||||
"proxy_socket_connections": 18,
|
||||
"proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:host]
|
||||
// TEST[s/"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",//]
|
||||
// TEST[s/"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",//]
|
||||
// TEST[s/"proxy_socket_connections": 18,//]
|
||||
// TEST[s/clustera.es.region-a.gcp.elastic-cloud.com:9400/\${transport_host}/]
|
||||
// TEST[s/clusterb.es.region-b.gcp.elastic-cloud.com:9400/\${transport_host}/]
|
||||
|
||||
. Set up bi-directional cross-cluster replication.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On cluster A ###
|
||||
PUT /_ccr/auto_follow/logs-generic-default
|
||||
{
|
||||
"remote_cluster": "clusterB",
|
||||
"leader_index_patterns": [
|
||||
".ds-logs-generic-default-20*"
|
||||
],
|
||||
"leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clustera",
|
||||
"follow_index_pattern": "{{leader_index}}-replicated_from_clusterb"
|
||||
}
|
||||
|
||||
### On cluster B ###
|
||||
PUT /_ccr/auto_follow/logs-generic-default
|
||||
{
|
||||
"remote_cluster": "clusterA",
|
||||
"leader_index_patterns": [
|
||||
".ds-logs-generic-default-20*"
|
||||
],
|
||||
"leader_index_exclusion_patterns":"{{leader_index}}-replicated_from_clusterb",
|
||||
"follow_index_pattern": "{{leader_index}}-replicated_from_clustera"
|
||||
}
|
||||
----
|
||||
// TEST[setup:remote_cluster]
|
||||
// TEST[s/clusterA/remote_cluster/]
|
||||
// TEST[s/clusterB/remote_cluster/]
|
||||
+
|
||||
IMPORTANT: Existing data on the cluster will not be replicated by
|
||||
`_ccr/auto_follow` even though the patterns may match. This function will only
|
||||
replicate newly created backing indices (as part of the data stream).
|
||||
+
|
||||
IMPORTANT: Use `leader_index_exclusion_patterns` to avoid recursion.
|
||||
+
|
||||
TIP: `follow_index_pattern` allows lowercase characters only.
|
||||
+
|
||||
TIP: This step cannot be executed via the {kib} UI due to the lack of an exclusion
|
||||
pattern in the UI. Use the API in this step.
|
||||
|
||||
. Set up the {ls} configuration file.
|
||||
+
|
||||
This example uses the input generator to demonstrate the document
|
||||
count in the clusters. Reconfigure this section
|
||||
to suit your own use case.
|
||||
+
|
||||
[source,logstash]
|
||||
----
|
||||
### On Logstash server ###
|
||||
### This is a logstash config file ###
|
||||
input {
|
||||
generator{
|
||||
message => 'Hello World'
|
||||
count => 100
|
||||
}
|
||||
}
|
||||
output {
|
||||
elasticsearch {
|
||||
hosts => ["https://clustera.es.region-a.gcp.elastic-cloud.com:9243","https://clusterb.es.region-b.gcp.elastic-cloud.com:9243"]
|
||||
user => "logstash-user"
|
||||
password => "same_password_for_both_clusters"
|
||||
}
|
||||
}
|
||||
----
|
||||
+
|
||||
IMPORTANT: The key point is that when `cluster A` is down, all traffic will be
|
||||
automatically redirected to `cluster B`. Once `cluster A` comes back, traffic
|
||||
is automatically redirected back to `cluster A` again. This is achieved by the
|
||||
option `hosts` where multiple ES cluster endpoints are specified in the
|
||||
array `[clusterA, clusterB]`.
|
||||
+
|
||||
TIP: Set up the same password for the same user on both clusters to use this load-balancing feature.
|
||||
|
||||
. Start {ls} with the earlier configuration file.
|
||||
+
|
||||
[source,sh]
|
||||
----
|
||||
### On Logstash server ###
|
||||
bin/logstash -f multiple_hosts.conf
|
||||
----
|
||||
|
||||
. Observe document counts in data streams.
|
||||
+
|
||||
The setup creates a data stream named `logs-generic-default` on each of the clusters. {ls} will write 50% of the documents to `cluster A` and 50% of the documents to `cluster B` when both clusters are up.
|
||||
+
|
||||
Bi-directional {ccr} will create one more data stream on each of the clusters
|
||||
with the `-replication_from_cluster{a|b}` suffix. At the end of this step:
|
||||
+
|
||||
* data streams on cluster A contain:
|
||||
** 50 documents in `logs-generic-default-replicated_from_clusterb`
|
||||
** 50 documents in `logs-generic-default`
|
||||
* data streams on cluster B contain:
|
||||
** 50 documents in `logs-generic-default-replicated_from_clustera`
|
||||
** 50 documents in `logs-generic-default`
|
||||
|
||||
. Queries should be set up to search across both data streams.
|
||||
A query on `logs*`, on either of the clusters, returns 100
|
||||
hits in total.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
GET logs*/_search?size=0
|
||||
----
|
||||
|
||||
|
||||
==== Failover when `clusterA` is down
|
||||
. You can simulate this by shutting down either of the clusters. Let's shut down
|
||||
`cluster A` in this tutorial.
|
||||
. Start {ls} with the same configuration file. (This step is not required in real
|
||||
use cases where {ls} ingests continuously.)
|
||||
+
|
||||
[source,sh]
|
||||
----
|
||||
### On Logstash server ###
|
||||
bin/logstash -f multiple_hosts.conf
|
||||
----
|
||||
|
||||
. Observe all {ls} traffic will be redirected to `cluster B` automatically.
|
||||
+
|
||||
TIP: You should also redirect all search traffic to the `clusterB` cluster during this time.
|
||||
|
||||
. The two data streams on `cluster B` now contain a different number of documents.
|
||||
+
|
||||
* data streams on cluster A (down)
|
||||
** 50 documents in `logs-generic-default-replicated_from_clusterb`
|
||||
** 50 documents in `logs-generic-default`
|
||||
* data streams On cluster B (up)
|
||||
** 50 documents in `logs-generic-default-replicated_from_clustera`
|
||||
** 150 documents in `logs-generic-default`
|
||||
|
||||
|
||||
==== Failback when `clusterA` comes back
|
||||
. You can simulate this by turning `cluster A` back on.
|
||||
. Data ingested to `cluster B` during `cluster A` 's downtime will be
|
||||
automatically replicated.
|
||||
+
|
||||
* data streams on cluster A
|
||||
** 150 documents in `logs-generic-default-replicated_from_clusterb`
|
||||
** 50 documents in `logs-generic-default`
|
||||
* data streams on cluster B
|
||||
** 50 documents in `logs-generic-default-replicated_from_clustera`
|
||||
** 150 documents in `logs-generic-default`
|
||||
|
||||
. If you have {ls} running at this time, you will also observe traffic is
|
||||
sent to both clusters.
|
||||
|
||||
==== Perform update or delete by query
|
||||
It is possible to update or delete the documents but you can only perform these actions on the leader index.
|
||||
|
||||
. First identify which backing index contains the document you want to update.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On either of the cluster ###
|
||||
GET logs-generic-default*/_search?filter_path=hits.hits._index
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"event.sequence": "97"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
+
|
||||
* If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clustera-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster A`.
|
||||
* If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clusterb-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster B`.
|
||||
* If the hits returns `"_index": ".ds-logs-generic-default-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on the same cluster where you performed the search query.
|
||||
|
||||
. Perform the update (or delete) by query:
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On the cluster identified from the previous step ###
|
||||
POST logs-generic-default/_update_by_query
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"event.sequence": "97"
|
||||
}
|
||||
},
|
||||
"script": {
|
||||
"source": "ctx._source.event.original = params.new_event",
|
||||
"lang": "painless",
|
||||
"params": {
|
||||
"new_event": "FOOBAR"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
+
|
||||
TIP: If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader, see <<ccr-index-soft-deletes-retention-period, index.soft_deletes.retention_lease.period>> for more details.
|
Binary file not shown.
After Width: | Height: | Size: 63 KiB |
Binary file not shown.
After Width: | Height: | Size: 60 KiB |
|
@ -343,3 +343,5 @@ include::getting-started.asciidoc[]
|
|||
include::managing.asciidoc[]
|
||||
include::auto-follow.asciidoc[]
|
||||
include::upgrading.asciidoc[]
|
||||
include::uni-directional-disaster-recovery.asciidoc[]
|
||||
include::bi-directional-disaster-recovery.asciidoc[]
|
||||
|
|
194
docs/reference/ccr/uni-directional-disaster-recovery.asciidoc
Normal file
194
docs/reference/ccr/uni-directional-disaster-recovery.asciidoc
Normal file
|
@ -0,0 +1,194 @@
|
|||
[role="xpack"]
|
||||
[[ccr-disaster-recovery-uni-directional-tutorial]]
|
||||
=== Tutorial: Disaster recovery based on uni-directional {ccr}
|
||||
++++
|
||||
<titleabbrev>Uni-directional disaster recovery</titleabbrev>
|
||||
++++
|
||||
|
||||
////
|
||||
[source,console]
|
||||
----
|
||||
PUT kibana_sample_data_ecommerce
|
||||
----
|
||||
// TESTSETUP
|
||||
|
||||
[source,console]
|
||||
----
|
||||
DELETE kibana_sample_data_ecommerce
|
||||
----
|
||||
// TEARDOWN
|
||||
////
|
||||
|
||||
|
||||
Learn how to failover and failback between two clusters based on uni-directional {ccr}. You can also visit <<ccr-disaster-recovery-bi-directional-tutorial>> to set up replicating data streams that automatically failover and failback without human intervention.
|
||||
|
||||
* Setting up uni-directional {ccr} replicated from `clusterA`
|
||||
to `clusterB`.
|
||||
* Failover - If `clusterA` goes offline, `clusterB` needs to "promote" follower
|
||||
indices to regular indices to allow write operations. All ingestion will need to
|
||||
be redirected to `clusterB`, this is controlled by the clients ({ls}, {beats},
|
||||
{agents}, etc).
|
||||
* Failback - When `clusterA` is back online, it assumes the role of a follower
|
||||
and replicates the leader indices from `clusterB`.
|
||||
|
||||
image::images/ccr-uni-directional-disaster-recovery.png[Uni-directional cross cluster replication failover and failback]
|
||||
|
||||
NOTE: {ccr-cap} provides functionality to replicate user-generated indices only.
|
||||
{ccr-cap} isn't designed for replicating system-generated indices or snapshot
|
||||
settings, and can't replicate {ilm-init} or {slm-init} policies across clusters.
|
||||
Learn more in {ccr} <<ccr-limitations,limitations>>.
|
||||
|
||||
==== Prerequisites
|
||||
Before completing this tutorial,
|
||||
<<ccr-getting-started-tutorial,set up cross-cluster replication>> to connect two
|
||||
clusters and configure a follower index.
|
||||
|
||||
In this tutorial, `kibana_sample_data_ecommerce` is replicated from `clusterA` to `clusterB`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
### On clusterB ###
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"persistent": {
|
||||
"cluster": {
|
||||
"remote": {
|
||||
"clusterA": {
|
||||
"mode": "proxy",
|
||||
"skip_unavailable": "true",
|
||||
"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
|
||||
"proxy_socket_connections": "18",
|
||||
"proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:host]
|
||||
// TEST[s/"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",//]
|
||||
// TEST[s/"proxy_socket_connections": 18,//]
|
||||
// TEST[s/clustera.es.region-a.gcp.elastic-cloud.com:9400/\${transport_host}/]
|
||||
// TEST[s/clusterA/remote_cluster/]
|
||||
|
||||
[source,console]
|
||||
----
|
||||
### On clusterB ###
|
||||
PUT /kibana_sample_data_ecommerce2/_ccr/follow?wait_for_active_shards=1
|
||||
{
|
||||
"remote_cluster": "clusterA",
|
||||
"leader_index": "kibana_sample_data_ecommerce"
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
// TEST[s/clusterA/remote_cluster/]
|
||||
|
||||
IMPORTANT: Writes (such as ingestion or updates) should occur only on the leader
|
||||
index. Follower indices are read-only and will reject any writes.
|
||||
|
||||
|
||||
==== Failover when `clusterA` is down
|
||||
|
||||
. Promote the follower indices in `clusterB` into regular indices so
|
||||
that they accept writes. This can be achieved by:
|
||||
* First, pause indexing following for the follower index.
|
||||
* Next, close the follower index.
|
||||
* Unfollow the leader index.
|
||||
* Finally, open the follower index (which at this point is a regular index).
|
||||
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterB ###
|
||||
POST /kibana_sample_data_ecommerce2/_ccr/pause_follow
|
||||
POST /kibana_sample_data_ecommerce2/_close
|
||||
POST /kibana_sample_data_ecommerce2/_ccr/unfollow
|
||||
POST /kibana_sample_data_ecommerce2/_open
|
||||
----
|
||||
// TEST[continued]
|
||||
|
||||
. On the client side ({ls}, {beats}, {agent}), manually re-enable ingestion of
|
||||
`kibana_sample_data_ecommerce2` and redirect traffic to the `clusterB`. You should
|
||||
also redirect all search traffic to the `clusterB` cluster during
|
||||
this time. You can simulate this by ingesting documents into this index. You should
|
||||
notice this index is now writable.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterB ###
|
||||
POST kibana_sample_data_ecommerce2/_doc/
|
||||
{
|
||||
"user": "kimchy"
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
|
||||
==== Failback when `clusterA` comes back
|
||||
|
||||
When `clusterA` comes back, `clusterB` becomes the new leader and `clusterA` becomes the follower.
|
||||
|
||||
. Set up remote cluster `clusterB` on `clusterA`.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterA ###
|
||||
PUT _cluster/settings
|
||||
{
|
||||
"persistent": {
|
||||
"cluster": {
|
||||
"remote": {
|
||||
"clusterB": {
|
||||
"mode": "proxy",
|
||||
"skip_unavailable": "true",
|
||||
"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
|
||||
"proxy_socket_connections": "18",
|
||||
"proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:host]
|
||||
// TEST[s/"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",//]
|
||||
// TEST[s/"proxy_socket_connections": 18,//]
|
||||
// TEST[s/clusterb.es.region-b.gcp.elastic-cloud.com:9400/\${transport_host}/]
|
||||
// TEST[s/clusterB/remote_cluster/]
|
||||
|
||||
. Existing data needs to be discarded before you can turn any index into a
|
||||
follower. Ensure the most up-to-date data is available on `clusterB` prior to
|
||||
deleting any indices on `clusterA`.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterA ###
|
||||
DELETE kibana_sample_data_ecommerce
|
||||
----
|
||||
// TEST[skip:need dual cluster setup]
|
||||
|
||||
|
||||
. Create a follower index on `clusterA`, now following the leader index in
|
||||
`clusterB`.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterA ###
|
||||
PUT /kibana_sample_data_ecommerce/_ccr/follow?wait_for_active_shards=1
|
||||
{
|
||||
"remote_cluster": "clusterB",
|
||||
"leader_index": "kibana_sample_data_ecommerce2"
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
// TEST[s/clusterB/remote_cluster/]
|
||||
|
||||
. The index on the follower cluster now contains the updated documents.
|
||||
+
|
||||
[source,console]
|
||||
----
|
||||
### On clusterA ###
|
||||
GET kibana_sample_data_ecommerce/_search?q=kimchy
|
||||
----
|
||||
// TEST[continued]
|
||||
+
|
||||
TIP: If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader, see <<ccr-index-soft-deletes-retention-period, index.soft_deletes.retention_lease.period>> for more details.
|
Loading…
Add table
Add a link
Reference in a new issue