kibana/docs/setup/upgrade/resolving-migration-failures.asciidoc

[[resolve-migrations-failures]]
=== Resolve migration failures

Migrating {kib} primarily involves migrating saved object documents to be compatible
with the new version.

[float]
==== Saved object migration failures

If {kib} unexpectedly terminates while migrating a saved object index, {kib} automatically attempts to
perform the migration again when the process restarts. Do not delete any saved objects indices to
fix a failed migration. Unlike previous versions, {kib} 7.12.0 and later does not require deleting
indices to release a failed migration lock.

If upgrade migrations fail repeatedly, refer to
<<preventing-migration-failures, preparing for migration>>.
When you address the root cause for the migration failure,
{kib} automatically retries the migration.
If you're unable to resolve a failed migration, contact Support.

[float]
==== Corrupt saved objects
To find and remedy problems caused by corrupt documents, we highly recommend testing your {kib} upgrade in a development cluster,
especially when there are custom integrations that create saved objects in your environment.

Saved objects that are corrupted through manual editing or integrations cause migration
failures with a log message, such as `Unable to migrate the corrupt Saved Object document ...`.
For a successful upgrade migration, you must fix or delete corrupt documents.

For example, you receive the following error message:

[source,sh]
--------------------------------------------
Unable to migrate the corrupt saved object document with _id: 'marketing_space:dashboard:e3c5fc71-ac71-4805-bcab-2bcc9cc93275'. To allow migrations to proceed, please delete this document from the [.kibana_7.12.0_001] index.
--------------------------------------------

To delete the documents that cause migrations to fail, take the following steps:

. Create a role as follows:
+
[source,console]
--------------------------------------------
PUT _security/role/grant_kibana_system_indices
{
  "indices": [
    {
      "names": [
        ".kibana*"
      ],
      "privileges": [
        "all"
      ],
      "allow_restricted_indices": true
    }
  ]
}
--------------------------------------------

. Create a user with the role above and `superuser` built-in role:
+
[source,console]
--------------------------------------------
POST /_security/user/temporarykibanasuperuser
{
  "password" : "l0ng-r4nd0m-p@ssw0rd",
  "roles" : [ "superuser", "grant_kibana_system_indices" ]
}
--------------------------------------------

. Remove the write block which the migration system has placed on the previous index:
+
[source,console]
--------------------------------------------
PUT .kibana_7.12.1_001/_settings
{
  "index": {
    "blocks.write": false
  }
}
--------------------------------------------

. Delete the corrupt document:
+
[source,console]
--------------------------------------------
DELETE .kibana_7.12.0_001/_doc/marketing_space:dashboard:e3c5fc71-ac71-4805-bcab-2bcc9cc93275
--------------------------------------------

. Restart {kib}.
+
The dashboard with the `e3c5fc71-ac71-4805-bcab-2bcc9cc93275` ID that belongs to the `marketing_space` space **is no longer available**.

You can configure {kib} to automatically discard all corrupt objects and transform errors that occur during a migration. When setting the configuration option `migrations.discardCorruptObjects`, {kib} will delete the conflicting objects and proceed with the migration.
 For instance, for an upgrade to 8.4.0, you can define the following setting in kibana.yml:

[source,yaml]
--------------------------------------------
migrations.discardCorruptObjects: "8.4.0"
--------------------------------------------

**WARNING:** Enabling the flag above will cause the system to discard all corrupt objects, as well as those causing transform errors. Thus, it is HIGHLY recommended that you carefully review the list of conflicting objects in the logs.

[float]
[[unknown-saved-object-types]]
==== Documents for unknown saved objects
Migrations will fail if saved objects belong to an unknown
saved object type. Unknown saved objects are typically caused by performing manual modifications
to the {es} index (no longer allowed in 8.x), or by disabling a plugin that had previously created a saved object.

We recommend using the <<upgrade-assistant,Upgrade Assistant>>
to discover and remedy any unknown saved object types. {kib} version 7.17.0 deployments containing unknown saved
object types will also log the following warning message:

[source,sh]
--------------------------------------------
CHECK_UNKNOWN_DOCUMENTS Upgrades will fail for 8.0+ because documents were found for unknown saved object types. To ensure that future upgrades will succeed, either re-enable plugins or delete these documents from the ".kibana_7.17.0_001" index after the current upgrade completes.
--------------------------------------------

If you fail to remedy this, your upgrade to 8.0+ will fail with a message like:

[source,sh]
--------------------------------------------
Unable to complete saved object migrations for the [.kibana] index: Migration failed because some documents were found which use unknown saved object types: someType,someOtherType

To proceed with the migration you can configure Kibana to discard unknown saved objects for this migration.
--------------------------------------------

To proceed with the migration, re-enable any plugins that previously created these saved objects. Alternatively, carefully review the list of unknown saved objects in the {kib} log entry. If the corresponding disabled plugins and their associated saved objects will no longer be used, they can be deleted by setting the configuration option `migrations.discardUnknownObjects` to the version you are upgrading to.
For instance, for an upgrade to 8.4.0, you can define the following setting in kibana.yml:

[source,yaml]
--------------------------------------------
migrations.discardUnknownObjects: "8.4.0"
--------------------------------------------

[float]
==== Incompatible settings or mappings
Matching index templates that specify `settings.refresh_interval` or
`mappings` are known to interfere with {kib} upgrades.
This can happen when index templates are defined manually.

To make sure the index templates won't apply to new `.kibana*` indices, narrow down the {data-sources} of any user-defined index templates.

[float]
==== Incompatible `xpack.tasks.index` configuration setting
In {kib} 7.5.0 and earlier, when the task manager index is set to `.tasks`
with the configuration setting `xpack.tasks.index: ".tasks"`,
upgrade migrations fail. In {kib} 7.5.1 and later, the incompatible configuration
setting prevents upgrade migrations from starting.

[float]
==== Repeated time-out requests that eventually fail
Migrations get stuck in a loop of retry attempts waiting for index yellow status that's never reached.
In the CLONE_TEMP_TO_TARGET or CREATE_REINDEX_TEMP steps, you might see a log entry similar to:

[source,sh]
--------------------------------------------
"Action failed with [index_not_yellow_timeout] Timeout waiting for the status of the [.kibana_8.1.0_001] index to become "yellow". Retrying attempt 1 in 2 seconds."
--------------------------------------------
The process is waiting for a yellow index status. There are two known causes:

* Cluster hits the low watermark for disk usage
* Cluster has <<routing-allocation-disabled,routing allocation disabled>>

Before retrying the migration, inspect the output of the `_cluster/allocation/explain?index=${targetIndex}` API to identify why the index isn't yellow:

[source,console]
--------------------------------------------
GET _cluster/allocation/explain
{
  "index": ".kibana_8.1.0_001",
  "shard": 0,
  "primary": true,
}
--------------------------------------------
If the cluster exceeded the low watermark for disk usage, the output should contain a message similar to this:

[source,sh]
--------------------------------------------
"The node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [11.692661332965082%]"
--------------------------------------------
Refer to the {es} guide for how to {ref}/disk-usage-exceeded.html[fix common cluster issues].

If routing allocation is the issue, the `_cluster/allocation/explain` API will return an entry similar to this:

[source,sh]
--------------------------------------------
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"
--------------------------------------------

[float]
[[routing-allocation-disabled]]
==== Routing allocation disabled or restricted
Upgrade migrations fail because routing allocation is disabled or restricted (`cluster.routing.allocation.enable: none/primaries/new_primaries`), which causes {kib} to log errors such as:

[source,sh]
--------------------------------------------
Unable to complete saved object migrations for the [.kibana] index: [incompatible_cluster_routing_allocation] The elasticsearch cluster has cluster routing allocation incorrectly set for migrations to continue. To proceed, please remove the cluster routing allocation settings with PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}, "persistent": {"cluster.routing.allocation.enable": null}}
--------------------------------------------

To get around the issue, remove the transient and persisted routing allocation settings:
[source,console]
--------------------------------------------
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": null
  },
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
--------------------------------------------

[float]
[[cluster-shard-limit-exceeded]]
==== {es} cluster shard limit exceeded
When upgrading, {kib} creates new indices requiring a small number of new shards. If the amount of open {es} shards approaches or exceeds the {es} `cluster.max_shards_per_node` setting, {kib} is unable to complete the upgrade. Ensure that {kib} is able to add at least 10 more shards by removing indices to clear up resources, or by increasing the `cluster.max_shards_per_node` setting.

For more information, refer to the documentation on {ref}/allocation-total-shards.html[total shards per node].