elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-25 07:37:19 -04:00

Author	SHA1	Message	Date
David Turner	c9d4892929	Weaken language about "low-latency" networks (#89198 ) Today we say that voting-only nodes require a "low-latency" network. This term has a specific meaning in some operating environments which is different from our intended meaning. To avoid this confusion this commit removes the absolute term "low-latency" in favour of describing the requirements relative to the user's own performance goals.	2022-08-09 13:15:37 +01:00
Leaf-Lin	945cb27782	[DOCS] Adding discovery troubleshooting link in the master get help page (#87344 ) * Adding discovery troubleshooting link * Add tags to pull in discovery troubleshooting content * Move discovery troubleshooting to separate page and add redirects Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-07-06 15:51:43 -04:00
Iraklis Psaroudakis	50d2cf31b8	Periodic warning for 1-node cluster w/ seed hosts (#88013 ) For fully-formed single-node clusters, emit a periodic warning if seed_hosts has been set to a non-empty list. Closes #85222	2022-06-30 16:35:15 +03:00
David Turner	80f7af58f8	More detail in discovery troubleshooting docs (#86930 ) In #85074 we added docs on discovery troubleshooting that really only talked about troubleshooting master elections. There's also the case where the master is elected fine but some other node can't join it. This commit adds troubleshooting docs about that too. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-06-06 08:33:45 +01:00
David Turner	79f181d208	Reduce resource needs of join validation (#85380 ) Fixes a few scalability issues around join validation: - compresses the cluster state sent over the wire - shares the serialized cluster state across multiple nodes - forks the decompression/deserialization work off the transport thread Relates #77466 Closes #83204	2022-04-26 12:15:54 +01:00
David Turner	33a553f61f	Fix up whitespace error introduced in #85948	2022-04-19 07:58:10 +01:00
David Turner	ce004d49e7	More docs re. removing cluster.initial_master_nodes (#85948 ) Ensures that on every page of the docs that mentions `cluster.initial_master_nodes` also mentions that this setting must be removed after bootstrapping completes.	2022-04-19 07:54:43 +01:00
David Turner	6a273886e9	Add technical docs on diagnosing instability etc (#85074 ) Copies some internal troubleshooting docs to the reference manual for wider use. Co-authored-by: James Rodewig <james.rodewig@gmail.com>	2022-03-31 09:01:10 +01:00
David Turner	fd76f9c5d1	Fix auto-bootstrap docs (#85215 ) Today it's no longer true that by default nodes will auto-discover other nodes on the same host and bootstrap them all into a cluster. This commit fixes the docs on auto-bootstrapping to recognise this.	2022-03-22 16:35:48 +00:00
Tobias Stadler	e3deacf547	[DOCS] Fix typos (#83895 )	2022-02-15 12:42:17 -05:00
James Rodewig	2f03112b5b	[DOCS] Synced with 8.0 stack upgrade changes (#83489 ) (#83596 ) This moves the bulk of the upgrade information into the consolidated upgrade guide, but leaves the primary upgrade topic in place as a cross reference. Relates to: https://github.com/elastic/stack-docs/pull/1970 Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> (cherry picked from commit `f6473d71f9`) Co-authored-by: debadair <debadair@elastic.co>	2022-02-07 11:01:42 -05:00
David Turner	7dd32fb027	Reduce verbosity-increase timeout to 3m (#81118 ) Today we increase the verbosity of discovery failures after 5 minutes without a master. Unfortunately 5 minutes is a common orchestration timeout, so if discovery is broken then we see nodes being shut down just before they start to emit useful logs. This commit reduces the default timeout to 3 minutes to address that.	2021-11-30 09:52:39 +00:00
David Turner	8cf4c7b6fb	Remove last few mentions of Zen discovery (#80410 ) We have a few leftover mentions of `zen` discovery, mostly for historical/BwC reasons, which this commit removes. Prior to this commit the default value for `discovery.type` was `zen` but this was not written down anywhere or officially supported: the two options were to set it to `single-node` or to omit it entirely. This commit changes the default to `multi-node` and documents this. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-11-09 09:52:06 +01:00
Adam Locke	529986e9b1	A typo error (#78987 ) (#79203 ) * A typo error a space between 'E' and 'cluster...' * Update example, fix headings, change notes Co-authored-by: Adam Locke <adam.locke@elastic.co> Co-authored-by: Marwane Chahoud <marwane.chahoud@gmail.com>	2021-10-15 08:52:03 -04:00
Martijn van Groningen	8a1deff75a	Improve fault-detection.asciidoc (#76821 ) Add section to fault-detection.asciidoc about nodes being removed from cluster due to slow cluster state applying.	2021-08-23 14:31:06 +02:00
David Turner	eabe2d1b34	Increase PeerFinder verbosity on persistent failure (#73128 ) If a node is partitioned away from the rest of the cluster then the `ClusterFormationFailureHelper` periodically reports that it cannot discover the expected collection of nodes, but does not indicate why. To prove it's a connectivity problem, users must today restart the node with `DEBUG` logging on `org.elasticsearch.discovery.PeerFinder` to see further details. With this commit we log messages at `WARN` level if the node remains disconnected for longer than a configurable timeout, which defaults to 5 minutes. Relates #72968	2021-05-17 10:52:18 +01:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
David Turner	2adeb4a666	Expand and consolidate networking docs (#68051 ) Today's network config docs are split into "Network", "HTTP" and "Transport" pages, with unclear relationships between them. We often encounter users with weird configs that indicate they don't really understand how these settings all relate. In fact these pages are all very interrelated, and the HTTP and Transport pages are almost all only for advanced users. This commit brings these docs into a single page and rewords some things to try and guide users away from the advanced settings unless their configuration needs all the extra complexity. It also adds a section entitled "Binding and publishing" which clarifies the meanings of the `bind_host` and `publish_host` parameters. This is also a common source of confusion amongst users. It also clarifies that many of these settings accept a list of addresses, and warns that this may not be what you want. Closes #67956. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-02-01 13:06:20 +00:00
David Turner	9c100cdeae	Extend default probe connect/handshake timeouts (#68059 ) Today the discovery phase has a short 1-second timeout for handshaking with a remote node after connecting, which allows it to quickly move on and retry in the case of connecting to something that doesn't respond straight away (e.g. it isn't an Elasticsearch node). This short timeout was necessary when the component was first developed because each connection attempt would block a thread. Since #42636 the connection attempt is now nonblocking so we can apply a more relaxed timeout. If transport security is enabled then our handshake timeout applies to the TLS handshake followed by the Elasticsearch handshake. If the TLS handshake alone takes over a second then the whole handshake times out with a `ConnectTransportException`, but this does not tell us which of the two individual handshakes took so long. TLS handshakes have their own 10-second timeout, which if reached yields a `SslHandshakeTimeoutException` that allows us to distinguish a problem at the TLS level from one at the Elasticsearch level. Therefore this commit extends the discovery probe timeouts.	2021-01-27 16:41:44 +00:00
Adam Locke	789ee2d73e	[DOCS] Combining important config settings into a single page (#63849 ) * Combining important config settings into a single page. * Updating ids for two pages causing link errors and implementing redirects.	2020-10-19 10:02:22 -04:00
James Rodewig	dcf0c3062f	[DOCS] Document dynamic discovery settings (#61420 )	2020-09-04 10:56:17 -04:00
Yannick Welsch	0b517ddca6	Provide option to allow writes when master is down (#60605 ) Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency). Eventually we should switch the default of cluster.no_master_block to this new mode.	2020-08-12 16:37:32 +02:00
David Turner	19eb922d9f	Remove join timeout (#60873 ) There is no point in timing out a join attempt any more. Timing out and retrying with the same master is pointless, and an in-flight join attempt to one master no longer blocks attempts to join other masters. This commit removes this unnecessary setting. Relates #60872 in which this setting was deprecated.	2020-08-10 13:57:54 +01:00
James Rodewig	2774cd6938	[DOCS] Swap `[float]` for `[discrete]` (#60124 ) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks	2020-07-23 11:48:22 -04:00
David Turner	c661a40083	Add docs for filesystem health checks (#59134 ) Documents the feature and settings introduced in #52680. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-07 14:14:35 +01:00
James Rodewig	70cb519aa7	[DOCS] Relocate discovery module content (#56611 ) * Moves `Discovery and cluster formation` content from `Modules` to `Set up Elasticsearch`. * Combines `Adding and removing nodes` with `Adding nodes to your cluster`. Adds related redirect. * Removes and redirects the `Modules` page. * Rewrites parts of `Discovery and cluster formation` to remove `module` references and meta references to the section.	2020-05-12 17:39:06 -04:00
David Turner	10ab397d7f	Adjust docs for voting config exclusions API (#55006 ) In #50836 we deprecated the existing voting config exclusions API and added a new one. This commit adjust the docs to match.	2020-04-20 19:47:09 +01:00
David Turner	e2cda1a279	"Adding nodes" instructions only work on localhost (#52677 ) The introductory sections of the reference manual contains some simplified instructions for adding a node to the cluster. Unfortunately they are a little too simplified and only really work for clusters running on `localhost`. If you try and follow these instructions for a distributed cluster then the new node will, confusingly, auto-bootstrap itself into a distinct one-node cluster. Multiple nodes running on localhost is a valid config, of course, but we should spell out that these instructions are really only for experimentation and that it takes a bit more work to add nodes to a distributed cluster. This commit does so. Also, the "important config" instructions for discovery say that you MUST set `discovery.seed_hosts` whereas in fact it is fine to ignore this setting and use a dynamic discovery mechanism instead. This commit weakens this statement and links to the docs for dynamic discovery mechanisms. Finally, this section is also overloaded with some technical details that are not important for this context and are adequately covered elsewhere, and completely fails to note that the default discovery port is 9300. This commit addresses this.	2020-02-27 08:51:17 +00:00
David Turner	a304d9a656	Ignore timeouts with single-node discovery (#52159 ) Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.	2020-02-11 14:00:06 +00:00
David Turner	69e0b1a0f4	Drop snapshot instructions for autobootstrap fix (#49755 ) The "Restore any snapshots as required" step is a trap: it's somewhere between tricky and impossible to restore multiple clusters into a single one. Also add a note about configuring discovery during a rolling upgrade to proscribe any rare cases where you might accidentally autobootstrap during the upgrade.	2019-12-02 12:43:18 +00:00
glerb	dd47cf4560	[DOCS] Correct typo in Discovery docs (#48494 )	2019-11-05 08:48:20 -05:00
David Turner	9e30a57ca5	More bootstrap docs tweaks (#47809 ) Clarifies not to set `cluster.initial_master_nodes` on nodes that are joining an existing cluster. Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-10-10 10:53:27 +02:00
David Turner	a84908cebd	Clarify that discovery ignores master-ineligibles (#44835 ) The changes in #32006 mean that the discovery process can no longer use master-ineligible nodes as a stepping-stone between master-eligible nodes. This was normally an indication of a strange and possibly-fragile configuration and was not recommended. This commit clarifies that only master-eligible nodes are now involved with discovery.	2019-09-12 11:12:35 +01:00
James Rodewig	5c78f606c2	[DOCS] Change // CONSOLE comments to [source,console] (#46440 )	2019-09-09 10:45:37 -04:00
David Turner	7776f755ee	More logging for slow cluster state application (#45007 ) Today the lag detector may remove nodes from the cluster if they fail to apply a cluster state within a reasonable timeframe, but it is rather unclear from the default logging that this has occurred and there is very little extra information beyond the fact that the removed node was lagging. Moreover the only forewarning that the lag detector might be invoked is a message indicating that cluster state publication took unreasonably long, which does not contain enough information to investigate the problem further. This commit adds a good deal more detail to make the issues of slow nodes more prominent: - after 10 seconds (by default) we log an INFO message indicating that a publication is still waiting for responses from some nodes, including the identities of the problematic nodes. - when the publication times out after 30 seconds (by default) we log a WARN message identifying the nodes that are still pending. - the lag detector logs a more detailed warning when a fatally-lagging node is detected. - if applying a cluster state takes too long then the cluster applier service logs a breakdown of all the tasks it ran as part of that process.	2019-08-01 08:21:40 +01:00
Lisa Cawley	60c8fc153a	[DOCS] Adds discovery.type (#42823 ) Co-Authored-By: David Turner <david.turner@elastic.co>	2019-06-05 12:29:40 -07:00
David Turner	ec427ff55e	More improvements to cluster coordination docs (#42799 ) This commit addresses a few more frequently-asked questions: * clarifies that bootstrapping doesn't happen even after a full cluster restart. * removes the example that uses IP addresses, to try and further encourage the use of node names for bootstrapping. * clarifies that auto-bootstrapping might form different clusters on different hosts, and gives a process for starting again if this wasn't what you wanted. * adds the "do not stop half-or-more of the master-eligible nodes" slogan that was notably absent. * reformats one of the console examples to a narrower width	2019-06-03 17:20:47 +01:00
David Turner	ed3230b3eb	Minor cluster coordination docs fixes (#42111 ) Fixes a typo and a badly-formatted warning.	2019-05-15 09:26:04 -04:00
David Turner	1e762a137e	Node names in bootstrap config have no ports (#41569 ) In cases where node names and transport addresses can be muddled, it is unclear that `cluster.initial_master_nodes: master-a:9300` means to look for a node called `master-a:9300` rather than a node called `master-a` with transport port `9300`. This commit adds docs to that effect.	2019-05-08 10:23:55 +01:00
David Turner	a4dff365fa	Add 'DO NOT TOUCH' warnings to disco settings docs (#41211 )	2019-04-15 19:22:10 +01:00
David Turner	f0fac9f56b	Further clarify cluster.initial_master_nodes (#41179 ) The following phrase causes confusion: > Alternatively the IP addresses or hostnames (if node name defaults to the > host name) can be used. This change clarifies the conditions under which you can use a hostname, and adds an anchor to the note introduced in (#41137) so we can link directly to it in conversations with users.	2019-04-14 10:39:50 +01:00
David Turner	cae6276811	Clarify initial_master_nodes must match node.name (#41137 ) ... and emphasize that this includes any trailing qualifiers.	2019-04-12 10:45:09 +01:00
Yannick Welsch	28a14e3e04	Add note about cluster state diffs (#39847 ) Mentions cluster state diffs in CS publishing docs.	2019-03-11 15:36:41 +01:00
Yannick Welsch	3b71a31557	Remove Zen1 (#39466 ) Removes all traces of Zen1 from the code base. Some of these commits will also be backported to 7.0/7.x (#39470) as the cluster.coordination package was making use of some things in discovery.zen and we want to keep 7.x as close as possible to master.	2019-03-04 15:51:12 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Yannick Welsch	504a89feaf	Step down as master when configured out of voting configuration (#37802 ) Abdicates to another master-eligible node once the active master is reconfigured out of the voting configuration, for example through the use of voting configuration exclusions. Follow-up to #37712	2019-01-29 12:43:04 +01:00

1 2 3

104 commits