Commit graph

817 commits

Author SHA1 Message Date
Tim Brooks
abd8a36f9b
Add documentation for remote cluster proxy mode (#52779)
This is related to #49067.
2020-03-09 10:49:41 -06:00
David Turner
e2cda1a279
"Adding nodes" instructions only work on localhost (#52677)
The introductory sections of the reference manual contains some simplified
instructions for adding a node to the cluster. Unfortunately they are a little
too simplified and only really work for clusters running on `localhost`. If you
try and follow these instructions for a distributed cluster then the new node
will, confusingly, auto-bootstrap itself into a distinct one-node cluster.

Multiple nodes running on localhost is a valid config, of course, but we should
spell out that these instructions are really only for experimentation and that
it takes a bit more work to add nodes to a distributed cluster. This commit
does so.

Also, the "important config" instructions for discovery say that you MUST set
`discovery.seed_hosts` whereas in fact it is fine to ignore this setting and
use a dynamic discovery mechanism instead. This commit weakens this statement
and links to the docs for dynamic discovery mechanisms.

Finally, this section is also overloaded with some technical details that are
not important for this context and are adequately covered elsewhere, and
completely fails to note that the default discovery port is 9300. This commit
addresses this.
2020-02-27 08:51:17 +00:00
James Rodewig
841d961b58
[DOCS] Document CCS-supported APIs (#52708)
Explicitly notes the Elasticsearch API endpoints that support CCS.

This should deter users from attempting to use CCS with other API
endpoints, such as `GET <index>/_doc/<_id>`.
2020-02-24 09:54:33 -05:00
James Rodewig
67f6840846
[DOCS] Document how CCS handles cluster-level settings (#49941)
Updates the cross-cluster search (CCS) documentation to note how
cluster-level settings are applied.

When `ccs_minimize_roundtrips` is `true`, each cluster applies its own
cluster-level settings to the request.

When `ccs_minimize_roundtrips` is `false`, cluster-level settings for
the local cluster is used. This includes shard limit settings, such as
`action.search.shard_count.limit`, `pre_filter_shard_size`, and
`max_concurrent_shard_requests`. If these limits are set too low, the
request could be rejected.
2020-02-19 09:14:22 -05:00
Yannick Welsch
a9afdd7611
Remove fixed_auto_queue_size threadpool type (#52280)
* Remove fixed_auto_queue_size threadpool type

* Remove less

* compilation fix

* weaken assertion to accomodate tests that mock threadpool
2020-02-14 16:20:40 +01:00
David Turner
a304d9a656
Ignore timeouts with single-node discovery (#52159)
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.

In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
2020-02-11 14:00:06 +00:00
Armin Braun
26b9cf787d
Add Trace Logging of REST Requests (#51684)
Being able to trace log all REST requests to a node would make debugging
a number of issues a lot easier.
2020-02-06 20:05:03 +01:00
István Zoltán Szabó
850278c69a
[DOCS] Adds recommendation on dedicated master-eligible nodes (#51674)
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2020-01-31 12:51:46 +01:00
István Zoltán Szabó
451eb1fa1f
[DOCS] Expands the documentation of Node Query Cache (#51105)
Co-authored-by: debadair <debadair@elastic.co>
2020-01-20 11:11:57 +01:00
Nhat Nguyen
09b46c8646
Goodbye and thank you synced flush! (#50882)
Synced flush was a brilliant idea. It supports instant recoveries with a 
quite small implementation. However, with the presence of sequence
numbers and retention leases, it is no longer needed. This change
removes it from 8.0.

Relates #5077
2020-01-16 09:43:07 -05:00
debadair
a3b851e9b9
[DOCS] Move snapshot-restore out of modules. (#49618)
* [DOCS] Move snapshot-restore docs out of modules.

* [DOCS] Incorporates comments from @jrodewig.

* [DOCS] Fix snippet tests
2020-01-09 16:12:02 -08:00
Stuart Tettemer
fb6ef69c6b
[DOCS] Deterministic scripted queries are cached (#50408)
Refs: #49321
2019-12-19 16:16:57 -07:00
Lisa Cawley
362ce41eaf
[DOCS] Updates ML links (#50387) 2019-12-19 14:47:28 -08:00
Patryk Krawaczyński
de4f701a19 [DOCS] Document index.queries.cache.enabled as a static setting (#49886) 2019-12-10 14:23:14 -05:00
James Rodewig
1a574115c1
[DOCS] Document CCR compatibility requirements (#49776)
* Creates a prerequisites section in the cross-cluster replication (CCR)
  overview.
* Adds concise definitions for local and remote cluster in a CCR context.
* Documents that the ES version of the local cluster must be the same
  or a newer compatible version as the remote cluster.
2019-12-02 15:52:13 -05:00
David Turner
69e0b1a0f4
Drop snapshot instructions for autobootstrap fix (#49755)
The "Restore any snapshots as required" step is a trap: it's somewhere between
tricky and impossible to restore multiple clusters into a single one.

Also add a note about configuring discovery during a rolling upgrade to
proscribe any rare cases where you might accidentally autobootstrap during the
upgrade.
2019-12-02 12:43:18 +00:00
István Zoltán Szabó
56888ff194
[DOCS] Removes the default size definition of thread pool types (#49442)
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-11-22 11:15:35 +01:00
James Rodewig
eca600326f
[DOCS] Document several missing thread pools (#48543)
Adds documentation for the following thread pools:
    - fetch_shard_started
    - fetch_shard_store
    - flush
    - force_merge
    - management

Closes #48524

Co-Authored-By: Jay Modi <jaymode@users.noreply.github.com>
2019-11-21 13:05:53 -05:00
James Rodewig
4db330d9e9
[DOCS] Replace cross-cluster search PNG images with SVGs (#49395) 2019-11-21 09:05:33 -05:00
weizijun
22042cc199 Document all shard allocation filtering attributes (#46992)
This commit adds coverage to the docs for some missing built-in shard
allocation attributes.
2019-11-21 08:29:45 -05:00
SylvainJuge
7072941577
[DOCS] minor fix to documentation: http.host can't default to itself (#48135)
fix minor typos on http.host and transport.host default values.
2019-11-14 15:56:13 +01:00
glerb
dd47cf4560 [DOCS] Correct typo in Discovery docs (#48494) 2019-11-05 08:48:20 -05:00
Jason Tedor
db015555e1
Fix specification for cluster.remote.connect (#48690)
The docs specify that cluster.remote.connect disables cross-cluster
search. This is correct, but not fully accurate as it disables any
functionality that relies on remote cluster connections: cross-cluster
search, remote data feeds, and cross-cluster replication. This commit
updates the docs to reflect this.
2019-10-30 11:25:27 -04:00
Ian Danforth
aa0eb006d2 [Doc] Fix typo in indices module docs (#48598) 2019-10-28 21:40:52 +01:00
James Rodewig
f4ac711d17
[DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (#48297) 2019-10-21 12:13:44 -04:00
François-Clément Brossard
0b107a0a09 Clarify low watermark documentation (#48112)
Today the docs say that the low watermark has no effect on any shards that have
never been allocated, but this is confusing. Here "shard" means "replication
group" not "shard copy" but this conflicts with the "never been allocated"
qualifier since one allocates shard copies and not replication groups.

This commit removes the misleading words. A newly-created replication group
remains newly-created until one of its copies is assigned, which might be quite
some time later, but it seems better to leave this implicit.
2019-10-16 12:27:39 +01:00
David Turner
9e30a57ca5
More bootstrap docs tweaks (#47809)
Clarifies not to set `cluster.initial_master_nodes` on nodes that are joining
an existing cluster.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-10-10 10:53:27 +02:00
David Turner
7b652adfbf
Remove include_relocations setting (#47717)
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a
bad idea since it will lead to the kinds of overshoot that were otherwise fixed
in #46079. This setting was deprecated in #47443. This commit removes it.
2019-10-08 13:33:49 +02:00
David Turner
9d67a02a56
Deprecate include_relocations setting (#47443)
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a
bad idea since it will lead to the kinds of overshoot that were otherwise fixed
in #46079. This commit deprecates this setting so it can be removed in the next
major release.
2019-10-08 09:15:13 +02:00
Lisa Cawley
4e4990c6a0
[DOCS] Cleans up links to security content (#47610) 2019-10-04 16:10:26 -07:00
James Rodewig
7583c07fa8
[DOCS] Reorder index APIs alphabetically (#46981) 2019-10-01 15:13:27 -04:00
Alan Woodward
c1f99e2d75
Remove _type from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
David Turner
c01f58aac9
Remove docs for proxy mode (#46677)
We added docs for proxy mode in #40281 but on reflection we should not be
documenting this setting since it does not play well with all proxies and we
can't recommend its use. This commit removes those docs and expands its Javadoc
instead.
2019-09-13 22:17:03 +01:00
Peter Dyson
43719c6c6a [DOCS] Add missing mention of current version to snapshot docs (#46516) 2019-09-12 08:30:29 -04:00
David Turner
a84908cebd Clarify that discovery ignores master-ineligibles (#44835)
The changes in #32006 mean that the discovery process can no longer use
master-ineligible nodes as a stepping-stone between master-eligible nodes.
This was normally an indication of a strange and possibly-fragile configuration
and was not recommended. This commit clarifies that only master-eligible nodes
are now involved with discovery.
2019-09-12 11:12:35 +01:00
James Rodewig
5c78f606c2
[DOCS] Change // CONSOLE comments to [source,console] (#46440) 2019-09-09 10:45:37 -04:00
James Rodewig
e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
James Rodewig
466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
Jim Ferenczi
a4ed7b1ca1
Decouple shard allocation awareness from search and get requests (#45735)
With this commit, Elasticsearch will no longer prefer using shards in the same location
(with the same awareness attribute values) to process `_search` and `_get` requests.
Instead, adaptive replica selection (the default since 7.0) should route requests more efficiently
using the service time of prior inter-node communications. Clusters with big latencies between
nodes should switch to cross cluster replication to isolate nodes within the same zone.
Note that this change only targets 8.0 since it is considered as breaking. However a follow up
pr should add an option to activate this behavior in 7.x in order to allow users to opt-in early.

Closes #43453
2019-09-04 21:48:03 +02:00
Armin Braun
df01766c15
Repository Cleanup Endpoint (#43900)
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 12:02:44 +02:00
James Rodewig
28107b2221
Retitle and relocate cross-cluster search docs (#45608) 2019-08-15 16:11:04 -04:00
James Rodewig
f1661ab058
[DOCS] Rewrite cross-cluster seach docs (#45583) 2019-08-15 13:23:25 -04:00
James Rodewig
66b8261e1b
[DOCS] Add diagrams to cross-cluster search documentation (#45569) 2019-08-15 10:59:58 -04:00
Chris Dean
96a234e461
[DOCS] - Updating chunk_size values to fix size value notation. Chunksize41591 (#45552)
* changes to chunk_size #41591

* update to chunk size to include ` `

* Update docs/plugins/repository-azure.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/modules/snapshots.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/plugins/repository-azure.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/plugins/repository-s3.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* edits to fix passive voice
2019-08-14 13:47:07 -05:00
Chris Dean
7b21ee75a3
[DOCS] Added cross-link to snapshot lifecycle management. Closes #44588. (#45408) 2019-08-09 16:33:40 -05:00
David Turner
bc31ea752e
Always auto-release the flood-stage block (#45274)
Removes support for using a system property to disable the automatic release of
the write block applied when a node exceeds the flood-stage watermark.

Relates #42559
2019-08-08 11:47:14 +01:00
Bukhtawar
c592d24300 Auto-release flood-stage write block (#42559)
If a node exceeds the flood-stage disk watermark then we add a block to all of
its indices to prevent further writes as a last-ditch attempt to prevent the
node completely exhausting its disk space. However today this block remains in
place until manually removed, and this block is a source of confusion for users
who current have ample disk space and did not even realise they nearly ran out
at some point in the past.

This commit changes our behaviour to automatically remove this block when a
node drops below the high watermark again. The expectation is that the high
watermark is some distance below the flood-stage watermark and therefore the
disk space problem is truly resolved.

Fixes #39334
2019-08-07 10:53:17 +01:00
Yannick Welsch
245cb348d3
Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-05 16:09:11 +02:00
David Turner
7776f755ee
More logging for slow cluster state application (#45007)
Today the lag detector may remove nodes from the cluster if they fail to apply
a cluster state within a reasonable timeframe, but it is rather unclear from
the default logging that this has occurred and there is very little extra
information beyond the fact that the removed node was lagging. Moreover the
only forewarning that the lag detector might be invoked is a message indicating
that cluster state publication took unreasonably long, which does not contain
enough information to investigate the problem further.

This commit adds a good deal more detail to make the issues of slow nodes more
prominent:

- after 10 seconds (by default) we log an INFO message indicating that a
  publication is still waiting for responses from some nodes, including the
  identities of the problematic nodes.

- when the publication times out after 30 seconds (by default) we log a WARN
  message identifying the nodes that are still pending.

- the lag detector logs a more detailed warning when a fatally-lagging node is
  detected.

- if applying a cluster state takes too long then the cluster applier service
  logs a breakdown of all the tasks it ran as part of that process.
2019-08-01 08:21:40 +01:00
Daniel Mitterdorfer
1f23fc704a
Clarify which circuit breaker settings are static (#44992)
Most of the circuit breaker settings are dynamically configurable.
However, `indices.breaker.total.use_real_memory` is not. With this
commit we add a clarifying note that this specific setting is static.

Closes #44974
2019-07-31 13:13:39 +02:00