Commit graph

817 commits

Author SHA1 Message Date
James Rodewig
ea1adb61c2
[DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:16:35 -04:00
James Rodewig
506de3ba83
[DOCS] Replace _meta with metadata for snapshot APIs. (#44596)
elastic/elasticsearch#41281 added custom metadata parameter to
snapshots. During review, the parameter name was changed from '_meta' to
'metadata,' but the documentation wasn't updated. This corrects the
documentation to use the 'metadata' name.
2019-07-19 08:40:34 -04:00
Lee Hinman
a2e0db7783
Add Snapshot Lifecycle Management (#43934)
* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed
2019-07-15 12:04:50 -06:00
Albert Zaharovits
3538cff422
[DOC] Backup & Restore Security Configuration (#42970)
This commit documents the backup and restore of a cluster's
security configuration.

It is not possible to only backup (or only restore) security
configuration, independent to the rest of the cluster's conf,
so this describes how a full configuration backup&restore
will include security as well. Moreover, it explains how part
of the security conf data resides on the special .security
index and how to backup that using regular data snapshot API.

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
Co-Authored-By: Tim Vernum <tim@adjective.org>
2019-07-10 14:05:01 +03:00
Akshesh Doshi
778e47f21f Draw attention to transport layer in remote cluster docs (#43883)
Closes #43858
2019-07-05 13:42:56 +02:00
Yannick Welsch
5ecf669c38
Clarify voting-only master node docs (#43857)
Clarifies the roles of a dedicated voting-only master-eligible node.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Co-Authored-By: David Turner <david.turner@elastic.co>
2019-07-02 18:48:29 +02:00
Yannick Welsch
e689b20eba
Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-25 17:29:30 +02:00
Lisa Cawley
23ff9d4011
[DOCS] Adds administering section (#43493) 2019-06-24 10:14:12 -07:00
Andrey Ershov
680d6edc0b
Get snapshots support for multiple repositories (#42090)
This commit adds multiple repositories support to get snapshots
request.
If some repository throws an exception this method does not fail fast
instead, it returns results for all repositories.
This PR is opened in favour of #41799, because we decided to change
the response format in a non-BwC manner. It makes sense to read a
discussion of the aforementioned PR.
This is the continuation of work done here #15151.
2019-06-19 16:04:13 +03:00
Colin Goodheart-Smithe
aeb2110dd0
Fixes formatting of CCS compatibility table (#43231) 2019-06-18 13:27:29 +01:00
Brandon Morelli
3ba3861e7b
Remove unneeded backticks (#43256) 2019-06-17 08:58:47 -07:00
Lisa Cawley
0140d512f9
[DOCS] Update node descriptions for default distribution (#42812) 2019-06-13 09:46:55 -07:00
Luca Cavanna
98ca0d3972
Add 6.8 to the remote clusters compatibility table (#42389)
The table does not include 6.8 as it was written before we knew we were releasing it. This commit adds it.
2019-06-13 11:18:07 +02:00
Mirek Svoboda
eaf76d2a32 Document wildcard for network interfaces (#28839)
With this commit we mention how Elasticsearch behaves when
either `0` or `0.0.0.0` is used for `network.host`.
2019-06-13 10:19:18 +02:00
Sam Mingo
0ce3a28ebb Update search-settings.asciidoc (#43016)
Grammar and spelling fixes
2019-06-10 10:14:27 +01:00
James Rodewig
fb079e527c
[DOCS] Move 'Scripting' section to top-level navigation. (#42939) 2019-06-06 10:45:04 -04:00
Lisa Cawley
60c8fc153a
[DOCS] Adds discovery.type (#42823)
Co-Authored-By: David Turner <david.turner@elastic.co>
2019-06-05 12:29:40 -07:00
Gordon Brown
eaa3f874b6
Add custom metadata to snapshots (#41281)
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
2019-06-05 10:55:07 -06:00
David Turner
ec427ff55e
More improvements to cluster coordination docs (#42799)
This commit addresses a few more frequently-asked questions:

* clarifies that bootstrapping doesn't happen even after a full cluster
  restart.

* removes the example that uses IP addresses, to try and further encourage the
  use of node names for bootstrapping.

* clarifies that auto-bootstrapping might form different clusters on different
  hosts, and gives a process for starting again if this wasn't what you wanted.

* adds the "do not stop half-or-more of the master-eligible nodes" slogan that
  was notably absent.

* reformats one of the console examples to a narrower width
2019-06-03 17:20:47 +01:00
Ryan Ernst
be8020ae99
Remove leftover transport module docs (#42734)
This commit removes docs for alternate transport implementations which
were removed years ago. These were missed because they have redirects
masking their existsence.
2019-05-30 22:29:58 -07:00
Ryan Ernst
9ffed17694
Remove transport client docs (#42483)
This commit removes the transport client documentation.
2019-05-30 15:03:48 -07:00
Yannick Welsch
c459ea828f
Remove node.max_local_storage_nodes (#42428)
This setting, which prior to Elasticsearch 5 was enabled by default and caused all kinds of
confusion, has since been disabled by default and is not recommended for production use. The
preferred way going forward is for users to explicitly specify separate data folders for each started
node to ensure that each node is consistently assigned to the same data path.

Relates to #42426
2019-05-23 16:02:12 +02:00
Jack Conradson
c59fbb3358
Reorganize Painless doc structure (#42303) 2019-05-21 13:47:47 -04:00
David Turner
ed3230b3eb
Minor cluster coordination docs fixes (#42111)
Fixes a typo and a badly-formatted warning.
2019-05-15 09:26:04 -04:00
James Rodewig
45e1e59371
[DOCS] Rewrite 'rewrite' parameter docs (#42018) 2019-05-13 08:42:26 -04:00
David Turner
0dd6b985c1
Remove mention of bulk threadpool in examples (#41935)
The `bulk` threadpool is now called `write`, but `bulk` is still
used in some examples. This commit fixes that.

Also, the only way `threadpool.bulk.write: 30` is a valid increase in the size
of this threadpool is if you have 29 processors, which is an odd number of
processors to have. This commit removes the "more threads" bit.
2019-05-08 12:08:47 +01:00
David Turner
1e762a137e
Node names in bootstrap config have no ports (#41569)
In cases where node names and transport addresses can be muddled, it is unclear
that `cluster.initial_master_nodes: master-a:9300` means to look for a node
called `master-a:9300` rather than a node called `master-a` with transport port
`9300`. This commit adds docs to that effect.
2019-05-08 10:23:55 +01:00
James Rodewig
adf67053f4
[DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:19:09 -04:00
James Rodewig
0225af44a0
[DOCS] Clarify Recovery Settings for Shard Relocation (#40329)
* Clarify that peer recovery settings apply to shard relocation

* Fix awkward wording of 1st sentence

* [DOCS] Remove snapshot recovery reference.
Call out link to [[cat-recovery]].
Separate expert settings.
2019-04-26 10:23:30 -04:00
Melori Arellano
553b9ff082
[DOCS] Add missing setting skip_unavailable to example
The example to delete a remote cluster is missing the `skip_unavailable` setting which results in an error:
```
        "type": "illegal_argument_exception",
        "reason": "missing required setting [cluster.remote.tiny-test.seeds] for setting [cluster.remote.tiny-test.skip_unavailable]"
```
2019-04-22 14:38:53 -06:00
David Turner
a4dff365fa
Add 'DO NOT TOUCH' warnings to disco settings docs (#41211) 2019-04-15 19:22:10 +01:00
David Turner
f0fac9f56b
Further clarify cluster.initial_master_nodes (#41179)
The following phrase causes confusion:

> Alternatively the IP addresses or hostnames (if node name defaults to the
> host name) can be used.

This change clarifies the conditions under which you can use a hostname, and
adds an anchor to the note introduced in (#41137) so we can link directly to it
in conversations with users.
2019-04-14 10:39:50 +01:00
David Turner
cae6276811
Clarify initial_master_nodes must match node.name (#41137)
... and emphasize that this includes any trailing qualifiers.
2019-04-12 10:45:09 +01:00
Henning Andersen
0783efda73
Node repurpose tool docs (#40525)
Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used.

Co-Authored-By: David Turner <david.turner@elastic.co>
2019-04-09 15:02:03 +02:00
David Turner
d696e57e5d
Add docs for cluster.remote.*.proxy setting (#40281)
In #33062 we introduced the `cluster.remote.*.proxy` setting for proxied
connections to remote clusters, but left it deliberately undocumented since it
needed followup work so that it could work with SNI. However, since #32517 is
now closed we can add this documentation and remove the comment about its lack
of documentation.
2019-03-28 12:09:31 +00:00
Luca Cavanna
27aee54d35 Fix bad cross-link
Relates to #39329
2019-03-18 11:54:46 +01:00
Luca Cavanna
87f4d3f851
[DOCS] add details on version compatibility and remote gateway selection (#40056)
This commit clarifies how the gateway selection works when configuring
remote clusters for CCR or CCS. Specifically, it clarifies compatibility
between different versions which is a very common question.
2019-03-18 11:37:56 +01:00
Alex Doerr
0f40658d09 Clarify version compatibility in snapshot/restore docs (#39329) 2019-03-18 11:29:36 +01:00
Lisa Cawley
7a6021ca98
[DOCS] Replaces CCS terms with attributes (#40076) 2019-03-15 07:54:45 -07:00
Lisa Cawley
43065ea536
[DOCS] Replaces CCR terms with attributes (#39516) 2019-03-12 14:27:17 -07:00
Andrey Ershov
09425d5a51
Add elasticsearch-node tool docs (#37812)
This commit, mostly authored by @DaveCTurner, 
adds documentation for elasticsearch-node tool #37696.
2019-03-12 12:43:29 +01:00
Yannick Welsch
28a14e3e04
Add note about cluster state diffs (#39847)
Mentions cluster state diffs in CS publishing docs.
2019-03-11 15:36:41 +01:00
Yannick Welsch
3b71a31557
Remove Zen1 (#39466)
Removes all traces of Zen1 from the code base. Some of these commits will also be backported to
7.0/7.x (#39470) as the cluster.coordination package was making use of some things in
discovery.zen and we want to keep 7.x as close as possible to master.
2019-03-04 15:51:12 +01:00
Lisa Cawley
a3c44c0270
[DOCS] Edits the remote clusters documentation (#38996) 2019-02-19 11:53:35 -08:00
Tim Brooks
a5cbef9d1b
Rebuild remote connections on profile changes (#37678)
Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.

Fixes #37201.
2019-02-19 11:32:21 -07:00
Luca Cavanna
b64ad1a62c
Tie break search shard iterator comparisons on cluster alias (#38853)
`SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor.

This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.
2019-02-15 13:44:55 +01:00
David Turner
5a3c452480
Align docs etc with new discovery setting names (#38492)
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
2019-02-06 11:34:38 +00:00
David Turner
3b2a0d7959
Rename no-master-block setting (#38350)
Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any
value set for the old setting is now ignored.
2019-02-05 08:47:56 +00:00
David Turner
2d114a02ff
Rename static Zen1 settings (#38333)
Renames the following settings to remove the mention of `zen` in their names:

- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
2019-02-05 08:46:52 +00:00
Yannick Welsch
ece8c659c5
Decrease leader and follower check timeout (#38298)
Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still
being a very long time for a node to be completely unresponsive.
2019-02-04 15:11:12 +01:00