mirror of
https://github.com/elastic/kibana.git
synced 2025-04-25 02:09:32 -04:00
RFC Improve saved object migrations algorithm (#84333)
* Instead of cloning, reindex legacy index * Reindex for every v2 migration * Use _reindex?require_alias=true and a write block toggle to prevent lost deletes * Use a ..._reindex_in_progress alias so that waiting for and preventing other reindex operations is idempotent The first version of the reindex block had only the instance which was able to mark the migration as complete set and remove the write block. This means other instances couldn't know if any reindex operaitons were in progress if the migration was already marked as complete. It also meant that a failure in this critical step could result in a permanent write block. * Revert "Use a ..._reindex_in_progress alias so that waiting for and preventing other reindex operations is idempotent" This reverts commit8baf9b13db
. * Revert "Use _reindex?require_alias=true and a write block toggle to prevent lost deletes" This reverts commitd7237ca42c
. * Use reindex + clone as a way to prevent lost deletes * Fix numbering and ignore index_not_found_exceptionfor temporary index * Apply suggestions from code review Co-authored-by: Josh Dover <me@joshdover.com> Co-authored-by: Josh Dover <me@joshdover.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
This commit is contained in:
parent
df3e14dbbd
commit
ce441bdc32
1 changed files with 27 additions and 23 deletions
|
@ -248,45 +248,49 @@ Note:
|
||||||
6. Use the reindexed legacy `.kibana_pre6.5.0_001` as the source for the rest of the migration algorithm.
|
6. Use the reindexed legacy `.kibana_pre6.5.0_001` as the source for the rest of the migration algorithm.
|
||||||
3. If `.kibana` and `.kibana_7.10.0` both exists and are pointing to the same index this version's migration has already been completed.
|
3. If `.kibana` and `.kibana_7.10.0` both exists and are pointing to the same index this version's migration has already been completed.
|
||||||
1. Because the same version can have plugins enabled at any point in time,
|
1. Because the same version can have plugins enabled at any point in time,
|
||||||
perform the mappings update in step (7) and migrate outdated documents
|
migrate outdated documents with step (9) and perform the mappings update in step (10).
|
||||||
with step (8).
|
2. Skip to step (12) to start serving traffic.
|
||||||
2. Skip to step (10) to start serving traffic.
|
|
||||||
4. Fail the migration if:
|
4. Fail the migration if:
|
||||||
1. `.kibana` is pointing to an index that belongs to a later version of Kibana .e.g. `.kibana_7.12.0_001`
|
1. `.kibana` is pointing to an index that belongs to a later version of Kibana .e.g. `.kibana_7.12.0_001`
|
||||||
2. (Only in 8.x) The source index contains documents that belong to an unknown Saved Object type (from a disabled plugin). Log an error explaining that the plugin that created these documents needs to be enabled again or that these objects should be deleted. See section (4.2.1.4).
|
2. (Only in 8.x) The source index contains documents that belong to an unknown Saved Object type (from a disabled plugin). Log an error explaining that the plugin that created these documents needs to be enabled again or that these objects should be deleted. See section (4.2.1.4).
|
||||||
5. Mark the source index as read-only and wait for all in-flight operations to drain (requires https://github.com/elastic/elasticsearch/pull/58094). This prevents any further writes from outdated nodes. Assuming this API is similar to the existing `/<index>/_close` API, we expect to receive `"acknowledged" : true` and `"shards_acknowledged" : true`. If all shards don’t acknowledge within the timeout, retry the operation until it succeeds.
|
5. Set a write block on the source index. This prevents any further writes from outdated nodes.
|
||||||
6. Clone the source index into a new target index which has writes enabled. All nodes on the same version will use the same fixed index name e.g. `.kibana_7.10.0_001`. The `001` postfix isn't used by Kibana, but allows for re-indexing an index should this be required by an Elasticsearch upgrade. E.g. re-index `.kibana_7.10.0_001` into `.kibana_7.10.0_002` and point the `.kibana_7.10.0` alias to `.kibana_7.10.0_002`.
|
6. Create a new temporary index `.kibana_7.10.0_reindex_temp` with `dynamic: false` on the top-level mappings so that any kind of document can be written to the index. This allows us to write untransformed documents to the index which might have fields which have been removed from the latest mappings defined by the plugin. Define minimal mappings for the `migrationVersion` and `type` fields so that we're still able to search for outdated documents that need to be transformed.
|
||||||
1. `POST /.kibana_n/_clone/.kibana_7.10.0_001?wait_for_active_shards=all {"settings": {"index.blocks.write": false}}`. Ignore errors if the clone already exists.
|
1. Ignore errors if the target index already exists.
|
||||||
2. Wait for the cloning to complete `GET /_cluster/health/.kibana_7.10.0_001?wait_for_status=green&timeout=60s` If cloning doesn’t complete within the 60s timeout, log a warning for visibility and poll again.
|
7. Reindex the source index into the new temporary index.
|
||||||
7. Update the mappings of the target index
|
1. Use `op_type=create` `conflicts=proceed` and `wait_for_completion=false` so that multiple instances can perform the reindex in parallel but only one write per document will succeed.
|
||||||
1. Retrieve the existing mappings including the `migrationMappingPropertyHashes` metadata.
|
2. Wait for the reindex task to complete. If reindexing doesn’t complete within the 60s timeout, log a warning for visibility and poll again.
|
||||||
2. Update the mappings with `PUT /.kibana_7.10.0_001/_mapping`. The API deeply merges any updates so this won't remove the mappings of any plugins that were enabled in a previous version but are now disabled.
|
8. Clone the temporary index into the target index `.kibana_7.10.0_001`. Since any further writes will only happen against the cloned target index this prevents a lost delete from occuring where one instance finishes the migration and deletes a document and another instance's reindex operation re-creates the deleted document.
|
||||||
3. Ensure that fields are correctly indexed using the target index's latest mappings `POST /.kibana_7.10.0_001/_update_by_query?conflicts=proceed`. In the future we could optimize this query by only targeting documents:
|
1. Set a write block on the temporary index
|
||||||
1. That belong to a known saved object type.
|
2. Clone the temporary index into the target index while specifying that the target index should have writes enabled.
|
||||||
2. Which don't have outdated migrationVersion numbers since these will be transformed anyway.
|
3. If the clone operation fails because the target index already exist, ignore the error and wait for the target index to become green before proceeding.
|
||||||
3. That belong to a type whose mappings were changed by comparing the `migrationMappingPropertyHashes`. (Metadata, unlike the mappings isn't commutative, so there is a small chance that the metadata hashes do not accurately reflect the latest mappings, however, this will just result in an less efficient query).
|
4. (The `001` postfix in the target index name isn't used by Kibana, but allows for re-indexing an index should this be required by an Elasticsearch upgrade. E.g. re-index `.kibana_7.10.0_001` into `.kibana_7.10.0_002` and point the `.kibana_7.10.0` alias to `.kibana_7.10.0_002`.)
|
||||||
8. Transform documents by reading batches of outdated documents from the target index then transforming and updating them with optimistic concurrency control.
|
9. Transform documents by reading batches of outdated documents from the target index then transforming and updating them with optimistic concurrency control.
|
||||||
1. Ignore any version conflict errors.
|
1. Ignore any version conflict errors.
|
||||||
2. If a document transform throws an exception, add the document to a failure list and continue trying to transform all other documents. If any failures occured, log the complete list of documents that failed to transform. Fail the migration.
|
2. If a document transform throws an exception, add the document to a failure list and continue trying to transform all other documents. If any failures occured, log the complete list of documents that failed to transform. Fail the migration.
|
||||||
9. Mark the migration as complete. This is done as a single atomic
|
10. Update the mappings of the target index
|
||||||
|
1. Retrieve the existing mappings including the `migrationMappingPropertyHashes` metadata.
|
||||||
|
2. Update the mappings with `PUT /.kibana_7.10.0_001/_mapping`. The API deeply merges any updates so this won't remove the mappings of any plugins that are disabled on this instance but have been enabled on another instance that also migrated this index.
|
||||||
|
3. Ensure that fields are correctly indexed using the target index's latest mappings `POST /.kibana_7.10.0_001/_update_by_query?conflicts=proceed`. In the future we could optimize this query by only targeting documents:
|
||||||
|
1. That belong to a known saved object type.
|
||||||
|
11. Mark the migration as complete. This is done as a single atomic
|
||||||
operation (requires https://github.com/elastic/elasticsearch/pull/58100)
|
operation (requires https://github.com/elastic/elasticsearch/pull/58100)
|
||||||
to guarantees when multiple versions of Kibana are performing the
|
to guarantee that when multiple versions of Kibana are performing the
|
||||||
migration in parallel, only one version will win. E.g. if 7.11 and 7.12
|
migration in parallel, only one version will win. E.g. if 7.11 and 7.12
|
||||||
are started in parallel and migrate from a 7.9 index, either 7.11 or 7.12
|
are started in parallel and migrate from a 7.9 index, either 7.11 or 7.12
|
||||||
should succeed and accept writes, but not both.
|
should succeed and accept writes, but not both.
|
||||||
3. Checks that `.kibana` alias is still pointing to the source index
|
1. Check that `.kibana` alias is still pointing to the source index
|
||||||
4. Points the `.kibana_7.10.0` and `.kibana` aliases to the target index.
|
2. Point the `.kibana_7.10.0` and `.kibana` aliases to the target index.
|
||||||
5. If this fails with a "required alias [.kibana] does not exist" error fetch `.kibana` again:
|
3. Remove the temporary index `.kibana_7.10.0_reindex_temp`
|
||||||
|
4. If this fails with a "required alias [.kibana] does not exist" error or "index_not_found_exception" for the temporary index, fetch `.kibana` again:
|
||||||
1. If `.kibana` is _not_ pointing to our target index fail the migration.
|
1. If `.kibana` is _not_ pointing to our target index fail the migration.
|
||||||
2. If `.kibana` is pointing to our target index the migration has succeeded and we can proceed to step (10).
|
2. If `.kibana` is pointing to our target index the migration has succeeded and we can proceed to step (12).
|
||||||
10. Start serving traffic. All saved object reads/writes happen through the
|
12. Start serving traffic. All saved object reads/writes happen through the
|
||||||
version-specific alias `.kibana_7.10.0`.
|
version-specific alias `.kibana_7.10.0`.
|
||||||
|
|
||||||
Together with the limitations, this algorithm ensures that migrations are
|
Together with the limitations, this algorithm ensures that migrations are
|
||||||
idempotent. If two nodes are started simultaneously, both of them will start
|
idempotent. If two nodes are started simultaneously, both of them will start
|
||||||
transforming documents in that version's target index, but because migrations
|
transforming documents in that version's target index, but because migrations
|
||||||
are idempotent, it doesn’t matter which node’s writes win.
|
are idempotent, it doesn’t matter which node’s writes win.
|
||||||
|
|
||||||
#### Known weaknesses:
|
#### Known weaknesses:
|
||||||
(Also present in our existing migration algorithm since v7.4)
|
(Also present in our existing migration algorithm since v7.4)
|
||||||
When the task manager index gets reindexed a reindex script is applied.
|
When the task manager index gets reindexed a reindex script is applied.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue