Commit graph

452 commits

Author SHA1 Message Date
Tal Levy
4d3f6816a7 Merge remote-tracking branch 'elastic/master' into enrich 2019-10-04 13:30:57 -07:00
Lee Hinman
17dc095606
Add API to execute SLM retention on-demand (#47405)
* Add API to execute SLM retention on-demand

This commit adds the `/_slm/_execute_retention` API endpoint. This
endpoint kicks off SLM retention and then returns immediately.

This in particular allows us to run retention without scheduling it
(for entirely manual invocation) or perform a one-off cleanup.

This commit also includes HLRC for the new API, and fixes an issue
in SLMSnapshotBlockingIntegTests where retention invoked prior to the
test completing could resurrect an index the internal test cluster
cleanup had already deleted.

Resolves #46508
Relates to #43663
2019-10-02 10:28:39 -06:00
James Rodewig
7583c07fa8
[DOCS] Reorder index APIs alphabetically (#46981) 2019-10-01 15:13:27 -04:00
István Zoltán Szabó
a6c517a96e
[DOCS] Changes wording to move away from data frame terminology in the ES repo (#47093)
* [DOCS] Changes wording to move away from data frame terminology in the ES repo.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-10-01 08:04:06 +02:00
Martijn van Groningen
197c1d59d4
Merge remote-tracking branch 'es/master' into enrich 2019-09-30 08:12:07 +02:00
Lisa Cawley
91992a805f
[DOCS] Moves Watcher content into Elasticsearch book (#47147)
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-27 16:05:44 -07:00
Martijn van Groningen
f676d9730d
Merge remote-tracking branch 'es/master' into enrich 2019-09-27 13:51:17 +02:00
Hendrik Muhs
fd3dc4da77
[Transform] rename data frame transform to transform for hlrc client (#46933)
rename data frame transform to transform for hlrc
2019-09-25 07:38:17 +02:00
Martijn van Groningen
afc16ba518
Merge remote-tracking branch 'es/master' into enrich 2019-09-23 09:34:53 +02:00
Lisa Cawley
4da98c9e46
[DOCS] Update data frame transform URLs (#46940) 2019-09-20 13:26:57 -07:00
Lisa Cawley
b1bbed84eb
[DOCS] Fixes data frame analytics job terminology in HLRC (#46758) 2019-09-16 10:00:44 -07:00
Lisa Cawley
b3dfd6e6d0
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 08:28:19 -07:00
Lisa Cawley
7f1e500512
[DOCS] Adds missing icons to Watcher HLRC APIs (#46626) 2019-09-11 16:32:47 -07:00
Lisa Cawley
5105db4eed
[DOCS] Adds missing icons to ILM HLRC APIs (#46633) 2019-09-11 15:45:07 -07:00
Lisa Cawley
e3945231d3
[DOCS] Adds missing icons to CCR HLRC APIs (#46631) 2019-09-11 15:35:42 -07:00
Lisa Cawley
7f5353bdcb
[DOCS] Adds missing icons to Graph HLRC APIs (#46630) 2019-09-11 15:20:36 -07:00
Lisa Cawley
e304815af2
[DOCS] Add missing icons to security HLRC APIs (#46619) 2019-09-11 13:19:13 -07:00
Lisa Cawley
03116c0f8c
[DOCS] Add missing icons to rollup HLRC APIs (#46617) 2019-09-11 11:48:32 -07:00
Lisa Cawley
7d0beb0b53
[DOCS] Add missing icons to transform HLRC APIs (#46616) 2019-09-11 11:21:45 -07:00
Martijn van Groningen
5d76d2d1e5
Add HLRC support for enrich get policy API. (#45970)
Changed the signature of AbstractResponseTestCase#createServerTestInstance(...)
to include the randomly selected xcontent type. This is needed for the
creating a server response instance with a query which is represented as BytesReference.
Maybe this should go into a different change?

This PR also includes HLRC docs for the get policy api.

Relates to #32789
2019-09-11 14:26:42 +02:00
Lisa Cawley
1e63105e30
[DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:26:56 -07:00
Lee Hinman
56aabcdd69
Add retention to Snapshot Lifecycle Management (#46407)
This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown
2019-09-09 09:55:34 -06:00
Martijn van Groningen
f97cc7f355
Merge remote-tracking branch 'es/master' into enrich 2019-09-09 08:38:37 +02:00
Lisa Cawley
210b592f62
[DOCS] Synchs Watcher API titles with better HLRC titles (#46328) 2019-09-04 17:03:05 -07:00
Martijn van Groningen
63fe69fea4
Merge remote-tracking branch 'es/master' into enrich 2019-09-02 08:45:43 +02:00
Jilles van Gurp
e40be722cc Add a few notes on Cancellable to the LLRC and HLRC docs. (#45912)
Add a section to both the low level and high level client documentation on asynchronous usage and `Cancellable` added for #44802 

Co-Authored-By: Lee Hinman <dakrone@users.noreply.github.com>
2019-08-28 10:59:33 +02:00
Martijn van Groningen
c8436a7a36
Merge remote-tracking branch 'es/master' into enrich 2019-08-28 10:05:14 +02:00
Dimitris Athanasiou
eab64250eb
[ML][HLRC] Add data frame analytics regression analysis (#46024) 2019-08-28 08:12:10 +03:00
Yogesh Gaikwad
5761b0a79c
Add manage_own_api_key cluster privilege (#45897)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-27 19:48:21 +10:00
Albert Zaharovits
715f7e9e01
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-26 18:53:10 +03:00
Martijn van Groningen
a1e8194a57
Add HLRC support for delete policy api (#45833)
This PR also adds HLRC docs.

Relates to #32789
2019-08-26 09:54:25 +02:00
Martijn van Groningen
b48784f5c1
Merge remote-tracking branch 'es/master' into enrich 2019-08-23 11:11:57 +02:00
markharwood
d1e00e3cf5
Search - added HLRC support for PinnedQueryBuilder (#45779)
* Added HLRC support for PinnedQueryBuilder

Related  #44074
2019-08-22 16:32:42 +01:00
Przemysław Witek
31f6e78acd
Allow the user to specify 'query' in Evaluate Data Frame request (#45775) 2019-08-22 08:27:38 +02:00
Dimitris Athanasiou
8af319481e
[ML] Add description to DF analytics (#45774) 2019-08-21 19:58:09 +03:00
Martijn van Groningen
a6917a1572
Merge remote-tracking branch 'es/master' into enrich 2019-08-21 14:17:16 +02:00
Przemysław Witek
c6a25a818d
Add docs for HLRC for Estimate memory usage API (#45538) 2019-08-21 12:52:17 +02:00
Martijn van Groningen
5707bc7f5d
Merge remote-tracking branch 'es/master' into enrich 2019-08-16 09:42:36 +02:00
Jim Ferenczi
38f9e52c3e
Add mapper-extras and the RankFeatureQuery in the hlrc (#43713)
This change adds the support for the RankFeatureQuery in the HLRC by
providing an extra dependency on mapper-extras-client. It also removes
the dependency on lang-painless in mapper-extras which is not needed
anymore since the move of the vector field into a dedicated module.

Closes #43634
2019-08-14 09:52:49 +02:00
Luca Cavanna
4baab594aa
Add support for cancelling async requests in low-level REST client (#45379)
The low-level REST client exposes a `performRequestAsync` method that
allows to send async requests, but today it does not expose the ability
to cancel such requests. That is something that the underlying apache
async http client supports, and it makes sense for us to expose.

This commit adds a return value to the `performRequestAsync` method,
which is backwards compatible. A `Cancellable` object gets returned,
which exposes a `cancel` public method. When calling `cancel`, the
on-going request associated with the returned `Cancellable` instance
will be cancelled by calling its `abort` method. This works throughout
multiple retries, though some special care was needed for the case where
`cancel` is called between different attempts (when one attempt has
failed and the consecutive one has not been sent yet).

Note that cancelling a request on the client side does not automatically 
translate to cancelling the server side execution of it. That needs to be 
specifically implemented, which is on the work for the search API (see #43332).

Relates to #44802
2019-08-13 16:48:06 +02:00
Martijn van Groningen
43b23aa505
Added HLRC support for enrich put policy API. (#45183)
This PR also adds HLRC docs.

Relates to #32789
2019-08-09 09:12:03 +02:00
David Roberts
65b502079a
[ML-DataFrame] Combine task_state and indexer_state in _stats (#45276)
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:

- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state

Closes #45201
2019-08-07 16:39:56 +01:00
Benjamin Trent
1da7c591c5
[ML][Data Frame] Add update transform api endpoint (#45154)
This adds the ability to `_update` stored data frame transforms. All mutable fields are applied when the next checkpoint starts. The exception being `description`. 

This PR contains all that is necessary for this addition:
* HLRC
* Docs
* Server side
2019-08-07 07:28:09 -05:00
Lisa Cawley
46912c8f3d
[DOCS] Reformats ML update APIs (#45253) 2019-08-06 11:05:01 -07:00
James Rodewig
8b152d6d79
Rename "indices APIs" to "index APIs" (#44863) 2019-08-02 14:09:46 -04:00
Lisa Cawley
285f2e0625
[DOCS] Updates terms in machine learning get APIs (#44986) 2019-07-30 10:52:23 -07:00
Lisa Cawley
3f31859669
[DOCS] Updates terms in machine learning datafeed APIs (#44883) 2019-07-26 10:47:03 -07:00
Lisa Cawley
aefb72040c
[DOCS] Updates terms in machine learning calendar APIs (#44866) 2019-07-25 11:20:42 -07:00
Yannick Welsch
76fcc81275
Add Clone Index API (#44267)
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.

Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.

Closes #44128
2019-07-25 20:17:51 +02:00
Lisa Cawley
9b16486615
[DOCS] Minor edits to HLRC ML APIs (#44865) 2019-07-25 10:00:06 -07:00