mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
* Allow data stream reindex tasks to be re-run after completion * Docs update * Update docs/reference/migration/apis/data-stream-reindex.asciidoc --------- Co-authored-by: Keith Massey <keith.massey@elastic.co>
359 lines
14 KiB
Text
359 lines
14 KiB
Text
[role="xpack"]
|
|
[[data-stream-reindex-api]]
|
|
=== Reindex data stream API
|
|
++++
|
|
<titleabbrev>Reindex data stream</titleabbrev>
|
|
++++
|
|
|
|
.New API reference
|
|
[sidebar]
|
|
--
|
|
For the most up-to-date API details, refer to {api-es}/group/endpoint-migration[Migration APIs].
|
|
--
|
|
|
|
include::{es-ref-dir}/migration/apis/shared-migration-apis-tip.asciidoc[]
|
|
|
|
The reindex data stream API is used to upgrade the backing indices of a data stream to the most
|
|
recent major version. It works by reindexing each backing index into a new index, then replacing the original
|
|
backing index with its replacement and deleting the original backing index. The settings and mappings
|
|
from the original backing indices are copied to the resulting backing indices.
|
|
|
|
This api runs in the background because reindexing all indices in a large data stream
|
|
is expected to take a large amount of time and resources. The endpoint will return immediately and a persistent
|
|
task will be created to run in the background. The current status of the task can be checked with
|
|
the <<data-stream-reindex-status-api,reindex status API>>. This status will be available for 24 hours after the task
|
|
completes, whether it finished successfully or failed. However, only the last status is retained so re-running a reindex
|
|
will overwrite the previous status for that data stream. A running or recently completed data stream reindex task can be
|
|
cancelled using the <<data-stream-reindex-cancel-api,reindex cancel API>>.
|
|
|
|
///////////////////////////////////////////////////////////
|
|
[source,console]
|
|
------------------------------------------------------
|
|
POST /_migration/reindex/my-data-stream/_cancel
|
|
DELETE _data_stream/my-data-stream
|
|
DELETE _index_template/my-data-stream-template
|
|
------------------------------------------------------
|
|
// TEARDOWN
|
|
///////////////////////////////////////////////////////////
|
|
|
|
|
|
[[data-stream-reindex-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`POST /_migration/reindex`
|
|
|
|
|
|
[[data-stream-reindex-api-prereqs]]
|
|
==== {api-prereq-title}
|
|
|
|
* If the {es} {security-features} are enabled, you must have the `manage`
|
|
<<privileges-list-indices,index privilege>> for the data stream.
|
|
|
|
[[data-stream-reindex-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`source`::
|
|
`index`:::
|
|
(Required, string) The name of the data stream to upgrade.
|
|
|
|
`mode`::
|
|
(Required, enum) Set to `upgrade` to upgrade the data stream in-place, using the same source and destination
|
|
data stream. Each out-of-date backing index will be reindexed. Then the new backing index is swapped into the data stream and the old index is deleted.
|
|
Currently, the only allowed value for this parameter is `upgrade`.
|
|
|
|
[[reindex-data-stream-api-settings]]
|
|
==== Settings
|
|
|
|
You can use the following settings to control the behavior of the reindex data stream API:
|
|
|
|
[[migrate_max_concurrent_indices_reindexed_per_data_stream-setting]]
|
|
// tag::migrate_max_concurrent_indices_reindexed_per_data_stream-setting-tag[]
|
|
`migrate.max_concurrent_indices_reindexed_per_data_stream`
|
|
(<<dynamic-cluster-setting,Dynamic>>)
|
|
The number of backing indices within a given data stream which will be reindexed concurrently. Defaults to `1`.
|
|
// end::migrate_max_concurrent_indices_reindexed_per_data_stream-tag[]
|
|
|
|
[[migrate_data_stream_reindex_max_request_per_second-setting]]
|
|
// tag::migrate_data_stream_reindex_max_request_per_second-setting-tag[]
|
|
`migrate.data_stream_reindex_max_request_per_second`
|
|
(<<dynamic-cluster-setting,Dynamic>>)
|
|
The average maximum number of documents within a given backing index to reindex per second.
|
|
Defaults to `1000`, though can be any decimal number greater than `0`.
|
|
To remove throttling, set to `-1`.
|
|
This setting can be used to throttle the reindex process and manage resource usage.
|
|
Consult the <<docs-reindex-throttle,reindex throttle docs>> for more information.
|
|
// end::migrate_data_stream_reindex_max_request_per_second-tag[]
|
|
|
|
|
|
[[reindex-data-stream-api-example]]
|
|
==== {api-examples-title}
|
|
|
|
Assume we have a data stream `my-data-stream` with the following backing indices, all of which have index major version 7.x.
|
|
|
|
* .ds-my-data-stream-2025.01.23-000001
|
|
* .ds-my-data-stream-2025.01.23-000002
|
|
* .ds-my-data-stream-2025.01.23-000003
|
|
|
|
Let's also assume that `.ds-my-data-stream-2025.01.23-000003` is the write index.
|
|
If {es} is version 8.x and we wish to upgrade to major version 9.x, the version 7.x indices must be upgraded in preparation.
|
|
We can use this API to reindex a data stream with version 7.x backing indices and make them version 8 backing indices.
|
|
|
|
Start by calling the API:
|
|
|
|
[[reindex-data-stream-start]]
|
|
[source,console]
|
|
----
|
|
POST _migration/reindex
|
|
{
|
|
"source": {
|
|
"index": "my-data-stream"
|
|
},
|
|
"mode": "upgrade"
|
|
}
|
|
----
|
|
// TEST[setup:my_data_stream]
|
|
|
|
|
|
As this task runs in the background this API will return immediately.
|
|
The task will do the following.
|
|
|
|
First, the data stream is rolled over. So that no documents are lost during the reindex, we add <<index-block-settings,write blocks>>
|
|
to the existing backing indices before reindexing them. Since a data stream's write index cannot have a write block,
|
|
the data stream is must be rolled over. This will produce a new write index, `.ds-my-data-stream-2025.01.23-000004`; which
|
|
has an 8.x version and thus does not need to be upgraded.
|
|
|
|
Once the data stream has a write index with an 8.x version we can proceed with reindexing the old indices.
|
|
For each of the version 7.x indices, we now do the following:
|
|
|
|
* Add a write block to the source index to guarantee that no writes are lost.
|
|
* Open the source index if it is closed.
|
|
* Delete the destination index if one exists. This is done in case we are retrying after a failure, so that we start with a fresh index.
|
|
* Create the destination index using the <<indices-create-index-from-source, create from source API>>.
|
|
This copies the settings and mappings from the old backing index to the new backing index.
|
|
* Use the <<docs-reindex, reindex API>> to copy the contents of the old backing index to the new backing index.
|
|
* Close the destination index if the source index was originally closed.
|
|
* Replace the old index in the data stream with the new index, using the <<modify-data-streams-api,modify data streams API>>.
|
|
* Finally, the old backing index is deleted.
|
|
|
|
By default only one backing index will be processed at a time.
|
|
This can be modified using the <<migrate_max_concurrent_indices_reindexed_per_data_stream-setting,`migrate_max_concurrent_indices_reindexed_per_data_stream-setting` setting>>.
|
|
|
|
While the reindex data stream task is running, we can inspect the current status using the <<data-stream-reindex-status-api,reindex status API>>:
|
|
[source,console]
|
|
----
|
|
GET /_migration/reindex/my-data-stream/_status
|
|
----
|
|
// TEST[continued]
|
|
|
|
For the above example, the following would be a possible status:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"start_time_millis": 1737676174349,
|
|
"complete": false,
|
|
"total_indices_in_data_stream": 4,
|
|
"total_indices_requiring_upgrade": 3,
|
|
"successes": 0,
|
|
"in_progress": [
|
|
{
|
|
"index": ".ds-my-data-stream-2025.01.23-000001",
|
|
"total_doc_count": 10000000,
|
|
"reindexed_doc_count": 999999
|
|
}
|
|
],
|
|
"pending": 2,
|
|
"errors": []
|
|
}
|
|
----
|
|
// TEST[skip:specific value is part of explanation]
|
|
|
|
This output means that the first backing index, `.ds-my-data-stream-2025.01.23-000001`, is currently being processed,
|
|
and none of the backing indices have yet completed. Notice that `total_indices_in_data_stream` has a value of `4`,
|
|
because after the rollover, there are 4 indices in the data stream. But the new write index has an 8.x version, and
|
|
thus doesn't need to be reindexed, so `total_indices_requiring_upgrade` is only 3.
|
|
|
|
|
|
|
|
[[reindex-data-stream-cancel-restart]]
|
|
===== Cancelling and Restarting
|
|
The <<reindex-data-stream-api-settings, reindex datastream settings>> provide a few ways to control the performance and
|
|
resource usage of a reindex task. This example shows how we can stop a running reindex task, modify the settings,
|
|
and restart the task.
|
|
|
|
Continuing with the above example, assume the reindexing task has not yet completed, and the <<data-stream-reindex-status-api,reindex status API>>
|
|
returns the following:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"start_time_millis": 1737676174349,
|
|
"complete": false,
|
|
"total_indices_in_data_stream": 4,
|
|
"total_indices_requiring_upgrade": 3,
|
|
"successes": 1,
|
|
"in_progress": [
|
|
{
|
|
"index": ".ds-my-data-stream-2025.01.23-000002",
|
|
"total_doc_count": 10000000,
|
|
"reindexed_doc_count": 1000
|
|
}
|
|
],
|
|
"pending": 1,
|
|
"errors": []
|
|
}
|
|
----
|
|
// TEST[skip:specific value is part of explanation]
|
|
|
|
Let's assume the task has been running for a long time. By default, we throttle how many requests the reindex operation
|
|
can execute per second. This keeps the reindex process from consuming too many resources.
|
|
But the default value of `1000` request per second will not be correct for all use cases.
|
|
The <<migrate_data_stream_reindex_max_request_per_second-setting,`migrate.data_stream_reindex_max_request_per_second` setting>>
|
|
can be used to increase or decrease the number of requests per second, or to remove the throttle entirely.
|
|
|
|
Changing this setting won't have an effect on the backing index that is currently being reindexed.
|
|
For example, changing the setting won't have an effect on `.ds-my-data-stream-2025.01.23-000002`, but would have an
|
|
effect on the next backing index.
|
|
|
|
But in the above status, `.ds-my-data-stream-2025.01.23-000002` has values of 1000 and 10M for the
|
|
`reindexed_doc_count` and `total_doc_count`, respectively. This means it has only reindexed 0.01% of the documents in the index.
|
|
It might be a good time to cancel the run and optimize some settings without losing much work.
|
|
So we call the <<data-stream-reindex-cancel-api,cancel API>>:
|
|
|
|
[source,console]
|
|
----
|
|
POST /_migration/reindex/my-data-stream/_cancel
|
|
----
|
|
// TEST[skip:task will not be present]
|
|
|
|
Now we can use the <<cluster-update-settings, update cluster settings API>> to increase the throttle:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /_cluster/settings
|
|
{
|
|
"persistent" : {
|
|
"migrate.data_stream_reindex_max_request_per_second" : 10000
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
The <<reindex-data-stream-start,original reindex command>> can now be used to restart reindexing.
|
|
Because the first backing index, `.ds-my-data-stream-2025.01.23-000001`, has already been reindexed and thus is already version 8.x,
|
|
it will be skipped. The task will start by reindexing `.ds-my-data-stream-2025.01.23-000002` again from the beginning.
|
|
|
|
Later, once all the backing indices have finished, the <<data-stream-reindex-status-api,reindex status API>> will return something like the following:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"start_time_millis": 1737676174349,
|
|
"complete": true,
|
|
"total_indices_in_data_stream": 4,
|
|
"total_indices_requiring_upgrade": 2,
|
|
"successes": 2,
|
|
"in_progress": [],
|
|
"pending": 0,
|
|
"errors": []
|
|
}
|
|
----
|
|
// TEST[skip:specific value is part of explanation]
|
|
|
|
Notice that the value of `total_indices_requiring_upgrade` is `2`, unlike the previous status, which had a value of `3`.
|
|
This is because `.ds-my-data-stream-2025.01.23-000001` was upgraded before the task cancellation.
|
|
After the restart, the API sees that it does not need to be upgraded, thus does not include it in `total_indices_requiring_upgrade` or `successes`,
|
|
despite the fact that it upgraded successfully.
|
|
|
|
The completed status will be accessible from the status API for 24 hours after completion of the task.
|
|
|
|
We can now check the data stream to verify that indices were upgraded:
|
|
|
|
[source,console]
|
|
----
|
|
GET _data_stream/my-data-stream?filter_path=data_streams.indices.index_name
|
|
----
|
|
// TEST[continued]
|
|
|
|
|
|
which returns:
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"data_streams": [
|
|
{
|
|
"indices": [
|
|
{
|
|
"index_name": ".migrated-ds-my-data-stream-2025.01.23-000003"
|
|
},
|
|
{
|
|
"index_name": ".migrated-ds-my-data-stream-2025.01.23-000002"
|
|
},
|
|
{
|
|
"index_name": ".migrated-ds-my-data-stream-2025.01.23-000001"
|
|
},
|
|
{
|
|
"index_name": ".ds-my-data-stream-2025.01.23-000004"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[skip:did not actually run reindex]
|
|
|
|
Index `.ds-my-data-stream-2025.01.23-000004` is the write index and didn't need to be upgraded because it was created with version 8.x.
|
|
The other three backing indices are now prefixed with `.migrated` because they have been upgraded.
|
|
|
|
We can now check the indices and verify that they have version 8.x:
|
|
[source,console]
|
|
----
|
|
GET .migrated-ds-my-data-stream-2025.01.23-000001?human&filter_path=*.settings.index.version.created_string
|
|
----
|
|
// TEST[skip:migrated index does not exist]
|
|
|
|
which returns:
|
|
[source,console-result]
|
|
----
|
|
{
|
|
".migrated-ds-my-data-stream-2025.01.23-000001": {
|
|
"settings": {
|
|
"index": {
|
|
"version": {
|
|
"created_string": "8.18.0"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip:migrated index does not exist]
|
|
|
|
[[reindex-data-stream-handling-failure]]
|
|
===== Handling Failures
|
|
Since the reindex data stream API runs in the background, failure information can be obtained through the <<data-stream-reindex-status-api,reindex status API>>.
|
|
For example, if the backing index `.ds-my-data-stream-2025.01.23-000002` was accidentally deleted by a user, we would see a status like the following:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"start_time_millis": 1737676174349,
|
|
"complete": false,
|
|
"total_indices_in_data_stream": 4,
|
|
"total_indices_requiring_upgrade": 3,
|
|
"successes": 1,
|
|
"in_progress": [],
|
|
"pending": 1,
|
|
"errors": [
|
|
{
|
|
"index": ".ds-my-data-stream-2025.01.23-000002",
|
|
"message": "index [.ds-my-data-stream-2025.01.23-000002] does not exist"
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[skip:result just part of explanation]
|
|
|
|
Once the issue has been fixed, the failed reindex task can be re-run. First, the failed run's status must be cleared
|
|
using the <<data-stream-reindex-cancel-api,reindex cancel API>>. Then the
|
|
<<reindex-data-stream-start,original reindex command>> can be called to pick up where it left off.
|