Commit graph

80 commits

Author SHA1 Message Date
Joe Gallo
326359f022
Remove ILM docs redirect (#85628) 2022-04-04 13:43:50 -04:00
Andrei Dan
3087f164f7
Migrate legacy/v2/component templates away from custom attributes routing (#82472)
This enhances the migrate to data tiers routing API to also iterate over
the existing legacy, composable, and component templates and look if
they define a custom node attribute routing in their settings for either
`index.routing.allocation.require.{nodeAttrName}` or
`index.routing.allocation.include.{nodeAttrName}`. If any does, we
update them to remove all the routings settings for the provided
`nodeAttrName`.

eg. any template with the following setting configuration:
```
"settings": {
  index.routing.allocation.require.data: "warm",
  index.routing.allocation.include.data: "rack1",
  index.routing.allocation.exclude.data: "rack2,rack3"
}
```
will have its settings updated to:
```
"settings": {}
```
2022-01-13 10:45:10 +00:00
Andrei Dan
f18c9c503e
Migrate to data tiers API dry run on any ILM status (#82226)
The migrate to data tiers routing API required ILM to be stopped. This
is fine for "live" runs, but for dry runs this isn't a requirement.

This changes the dry_run to allow the API to run irrespective of the ILM
status.
2022-01-05 13:40:27 +00:00
Mary Gouseti
175c4793f9
Expose the index age in ILM explain output. (#81273)
* Expose the index age in ILM explain output.

 ILM already exposes the `age` that ILM will use to transition to the next phase, based on that phase's `min_age`. The `index_age` is based only on the index creation date and it's used to trigger a rollover.

 Resolves #64429
2021-12-13 15:38:25 +01:00
Joe Gallo
27186ceda7
_tier_preference docs tweaks (#81453) 2021-12-08 10:34:30 -05:00
James Rodewig
f56a0f4b66
[DOCS] Remove testenv annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00
Lee Hinman
3d5843a236
Allow ILM move-to-step without action or name (#75435)
* Allow ILM move-to-step without `action` or `name`

This commit enhances ILM's move-to-step API to allow dropping the `name`, or dropping both the
`action` and `name`. For example:

```json
POST /_ilm/move/foo-1
{
  "current_step": {
    "phase": "hot",
    "action": "rollover",
    "name": "check-rollover-ready"
  },
  "next_step": {
    "phase": "warm",
    "action": "forcemerge"
  }
}
```

Will move to the first step in the `forcemerge` action in the `warm` phase (without having to know
the specific step name).

Another example:

```json
POST /_ilm/move/foo-1
{
  "current_step": {
    "phase": "hot",
    "action": "rollover",
    "name": "check-rollover-ready"
  },
  "next_step": {
    "phase": "warm"
  }
}
```

Will move to the first step in the `warm` phase (without having to know the specific action name).

Bear in mind that the execution order is still entirely an implementation detail, so "first" in the
above sentences means the first step that ILM would execute.

Resolves #58128

* Apply Andrei's wording change (thanks!)

Co-authored-by: Andrei Dan <andrei.dan@elastic.co>

* Log index and policy name when the concrete step key can't be resolved

Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2021-07-19 12:10:25 -04:00
Andrei Dan
5ca240eabd
Add dry_run support to the migrate to data tiers API (#74639)
This adds support for a `dry_run` parameter for the
`_ilm/migrate_to_data_tiers` API. This defaults to `false`, but when
configured to `true` it will simulate the migration of elasticsearch
entities to data tiers based routing, returning the entites that need to
be updated (indices, ILM policies and the legacy index template that'd
be deleted, if any was configured in the request).
2021-06-29 10:37:16 +01:00
Andrei Dan
636aa7c0da
Add migrate to data tiers API (#74264)
This adds the _ilm/migrate_to_data_tiers API to expose the service for
migrating the elasticsearch abstractions (indices, ILM policies and an 
optional legacy template to delete) to data tiers routing allocation 
(away from custom node attributes)
2021-06-28 12:07:39 +01:00
Lee Hinman
997db17852
Add usage to get ILM policy response (#74518)
This commit adds the "in_use_by" object to the response for ILM policies. This map shows the
indices, data streams, and composable templates that use the ILM policy.

An example output may look like:

```json
{
  "logs" : {
    "version" : 1,
    "modified_date" : "2021-06-23T18:42:08.381Z",
    "policy" : {
      ...
    },
    "in_use_by" : {
      "indices" : [".ds-logs-foo-barbaz-2021.06.23-000001", ".ds-logs-foo-other-2021.06.23-000001"],
      "data_streams" : ["logs-foo-barbaz", "logs-foo-other"],
      "composable_templates" : ["logs"]
    }
  }
}
```

Resolves #73869
2021-06-23 16:01:19 -06:00
Stef Nestor
fc1ea9c317
[DOCS] Fix operation_mode response property def (#73976) 2021-06-10 13:31:15 -04:00
bellengao
b6fd1bbb06
Add _meta field to ilm policy (#73515)
Relates to #70755.

The main changes of this PR are:

    Add an optional _meta field to ILM policy.
    Add some test code about the change.
    Update the doc of Create or update lifecycle policy API.
2021-06-01 11:17:53 -06:00
James Rodewig
5729bb8d49
[DOCS] Update alias references (#73427)
Updates several `index aliases` references to `aliases`.
2021-05-27 16:00:57 -04:00
James Rodewig
9ab1a6caa3
[DOCS] Fix put lifecycle policy API title (#71124) 2021-03-31 11:37:45 -04:00
James Rodewig
6f02735f86 [DOCS] Fix ILM user note 2021-01-14 10:08:24 -05:00
James Rodewig
e795ab965a
[DOCS] Fix API titles (#67475) 2021-01-13 15:15:37 -05:00
James Rodewig
9099daef7b
[DOCS] Note ILM uses snapshot of user privileges (#67393) 2021-01-12 16:35:01 -05:00
James Rodewig
1ea83359bb
[DOCS] Fix case for 'Boolean' (#64299) 2020-10-29 09:04:43 -04:00
Andrei Dan
c1746afffd
ILM migrate data between tiers (#61377)
This adds ILM support for automatically migrating the managed
indices between data tiers.

This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).
2020-09-17 10:56:49 +01:00
Mikołaj Przybysz
9e8d8ee38a
[DOCS] Add line break to get ILM lifecycle API docs (#61892) 2020-09-04 09:00:11 -04:00
James Rodewig
441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
James Rodewig
55b6c1ab82
[DOCS] Add data streams to ILM explain API (#59343) 2020-07-13 08:49:10 -04:00
James Rodewig
b99eb6d988
[DOCS] Add data streams to remove lifecycle policy API (#58777) 2020-07-02 09:40:12 -04:00
debadair
e0e5572f34
[DOCS] Editorial ILM cleanup (#57565)
* [DOCS] Editorial cleanup

* Moved example of applying a template to multiple indices.

* Combine existing indices topics

* Fixed test

* Add skip rollover file.

* Revert rename.

* Update include.

* Revert rename

* Apply suggestions from code review

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

* Apply suggestions from code review

* Fixed callout

* Update docs/reference/ilm/ilm-with-existing-indices.asciidoc

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

* Update docs/reference/ilm/ilm-with-existing-indices.asciidoc

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

* Apply suggestions from code review

* Restored policy to template example.

* Fixed JSON parse error

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2020-06-05 16:24:03 -07:00
James Rodewig
6a769ea699
[DOCS] Replace docdir attribute with es-repo-dir (#57564) 2020-06-02 17:52:41 -04:00
Lisa Cawley
8b9293b3bf
[DOCS] Replace docdir attribute with es-repo-dir (#57489) 2020-06-01 15:55:05 -07:00
debadair
52a494cea7
[DOCS] Add request body param descriptions for move to step (#56560) 2020-05-12 14:30:12 -07:00
debadair
cb5fdc5226
[DOCS] Rework conceptual info for ILM. (#52181)
* [DOCS] Rework conceptual info for ILM.

* Split the actions out of concepts.

* Added xpack role to actions.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Apply suggestions from code review
2020-04-22 13:53:30 -07:00
Tanguy Leroux
f6feb6c2c8
Merge feature/searchable-snapshots branch into master (#54803)
This commit merges the searchable-snapshots feature branch into master.
See #54803 for the complete list of squashed commits.

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-06 15:51:05 +02:00
debadair
0fed96eebc [DOCS] Align with ILM API docs (#48705)
* [DOCS] Reconciled with Snapshot/Restore reorg
2020-01-22 20:44:19 -08:00
Deb Adair
6f3581173b Revert "[DOCS] Align with ILM API docs (#48705)"
This reverts commit ec9437832d.
2020-01-21 22:32:40 -08:00
debadair
ec9437832d
[DOCS] Align with ILM API docs (#48705)
* [DOCS] Reconciled with Snapshot/Restore reorg
2020-01-21 19:58:17 -08:00
Xiang Dai
432bd0e92c Fix docs typos (#50365)
Fixes a few typos in the docs.

Signed-off-by: Xiang Dai 764524258@qq.com
2019-12-23 10:35:14 -05:00
Lisa Cawley
ff2072e698
[DOCS] Reformat ILM API docs (#49348) 2019-11-20 08:19:33 -08:00
Mathew Davis
a70a72b811 Fixing a typo in the stop SLM api request header. 2019-11-19 23:05:15 -07:00
Andrei Dan
454020ac8a
ILM Make the check-rollover-ready step retryable (#48256)
This adds the infrastructure to be able to retry the execution of retryable
steps and makes the `check-rollover-ready` retryable as an initial step to
make the rollover action more resilient to transient errors.
2019-10-31 09:16:20 +00:00
Gordon Brown
ef7fdf1800
SLM Start/Stop HLRC and docs (#47966)
This commit adds HLRC support and documentation for the SLM Start and
Stop APIs, as well as updating existing documentation where appropriate.

This commit also ensures that the SLM APIs are properly included in the
HLRC documentation.
2019-10-14 15:19:49 -06:00
James Rodewig
488a258453
[DOCS] Reformat docs for several snapshot lifecycle policy APIs (#47998) 2019-10-14 12:29:11 -04:00
James Rodewig
879c2cb06b
[DOCS] Reformat get snapshot lifecycle policy API docs (#47827) 2019-10-11 12:01:20 -04:00
James Rodewig
c14de778f5
[DOCS] Reformat put snapshot lifecycle policy API docs (#47811) 2019-10-11 10:56:03 -04:00
Lee Hinman
71fab46950
Add Snapshot Lifecycle Retention documentation (#47545)
* Add Snapshot Lifecycle Retention documentation

This commits adds API and general purpose documentation for SLM
retention.

Relates to #43663

* Fix docs tests

* Update default now that #47604 has been merged

* Update docs/reference/ilm/apis/slm-api.asciidoc

Co-Authored-By: Gordon Brown <gordon.brown@elastic.co>

* Update docs/reference/ilm/apis/slm-api.asciidoc

Co-Authored-By: Gordon Brown <gordon.brown@elastic.co>

* Update docs with feedback
2019-10-08 08:32:15 -06:00
Lisa Cawley
4e4990c6a0
[DOCS] Cleans up links to security content (#47610) 2019-10-04 16:10:26 -07:00
Lisa Cawley
91992a805f
[DOCS] Moves Watcher content into Elasticsearch book (#47147)
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-27 16:05:44 -07:00
Gordon Brown
b42923f339
Add support for POST requests to SLM Execute API (#47061)
This commit adds support for POST requests to the SLM `_execute` API,
because POST is a more appropriate HTTP verb for this action as it is
not idempotent. The docs are also changed to favor POST over PUT,
although PUT is not removed or officially deprecated.
2019-09-25 11:32:44 -06:00
Gordon Brown
3035cd35a4
Change SLM stats format (#46991)
Using arrays of objects with embedded IDs is preferred for new APIs over
using entity IDs as JSON keys.  This commit changes the SLM stats API to
use the preferred format.
2019-09-24 15:57:52 -06:00
James Rodewig
370e434986
[DOCS] Correct several [source,console-result] snippets (#46930) 2019-09-20 11:23:15 -04:00
Lee Hinman
1a71ebb2fb
Add node setting for disabling SLM (#46794)
This adds the `xpack.slm.enabled` setting to allow disabling of SLM
functionality as well as its HTTP API endpoints.

Relates to #38461
2019-09-17 15:10:03 -06:00
James Rodewig
e355759086
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46679) 2019-09-13 11:23:53 -04:00
James Rodewig
5772c1c7dd
[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
Lee Hinman
56aabcdd69
Add retention to Snapshot Lifecycle Management (#46407)
This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown
2019-09-09 09:55:34 -06:00