elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 02:13:33 -04:00

Author	SHA1	Message	Date
shainaraskas	d37e1bd14d	Fix broken anchors (#119802 )	2025-01-09 09:15:00 -05:00
shainaraskas	17111e1258	[DOCS] Concept cleanup 2 - ES settings (#119373 )	2025-01-06 12:07:15 -05:00
David Turner	c5166ccf6f	Revert "(+Doc) Link split-brain wiki (#108914 )" This reverts commit `12aab08330`.	2024-12-02 08:11:34 +00:00
David Turner	33af77bcb3	Mention full-cluster restart in `initial_master_node` docs (#112986 ) Apparently some users consider "node is restarting" not to apply to a full-cluster restart. This commit further clarifies that you must not set `cluster.initial_master_nodes` in a full cluster restart.	2024-09-19 10:41:39 +01:00
David Turner	9387ce3357	Deduplicate unstable-cluster troubleshooting docs (#112333 ) We duplicated these docs in order to avoid breaking older links, but this makes it confusing and hard to link to the right copy of the information. This commit removes the duplication by replacing the docs at the old locations with stubs that link to the new locations.	2024-08-29 13:16:37 +01:00
David Turner	59a42ed41b	Include network disconnect info in troubleshooting docs (#112323 ) A misplaced `//end::` tag meant that the docs added in #112271 are only included in the page on fault detection and not the equivalent troubleshooting docs. This commit fixes the problem.	2024-08-29 15:03:13 +10:00
David Turner	42d650b9bb	Add docs for troubleshooting network disconnects (#112271 ) Basically the same as for nodes that leave the cluster with reason `disconnected`, except that these disconnects don't involve the master so don't cause any nodes to leave the cluster.	2024-08-28 18:59:11 +10:00
David Turner	e5fd63bbb8	More detail around packet captures (#111835 ) Clarify that it's best to analyse the captures alongside the node logs, and spell out in a bit more detail how to use packet captures and logs to pin down the cause of a `disconnected` node.	2024-08-13 21:55:38 +01:00
David Turner	0131e80624	Revert "(+Doc) link split-brain wiki from quorom decision making (#108915 )" This reverts commit `4d3ca2d029`.	2024-06-16 08:54:44 +01:00
Stef Nestor	4d3ca2d029	(+Doc) link split-brain wiki from quorom decision making (#108915 ) Mini change to link the [wiki page about "split-brain"](https://en.wikipedia.org/wiki/Split-brain_(computing)) as an industry-not-Elastic term under [Quorum-based decision making](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-quorums.html)	2024-05-22 13:22:03 -06:00
Stef Nestor	12aab08330	(+Doc) Link split-brain wiki (#108914 ) Mini change to link the wiki page about "split-brain" as an industry-not-Elastic term under [Voting configurations](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-voting.html).	2024-05-22 13:21:54 -06:00
Jake Landis	bb9566a57e	Update discovery.asciidoc (#106541 ) (#106695 ) Fix typo (cherry picked from commit `96a46b9c5b`) Co-authored-by: Boen <13752080613@163.com>	2024-03-22 15:43:48 -04:00
David Turner	61191b880c	Link to troubleshooting docs from other disco pages (#102509 ) I have several times struggled to find the docs about restoring from a snapshot if a quorum cannot be found. That info is on the discovery troubleshooting page, but it seems I expect it to be on somewhere like the quorums or voting docs pages instead. This commit adds links from those pages to the troubleshooting page.	2023-11-23 09:45:21 +00:00
David Turner	9b51d9972d	More specific `cluster.initial_master_nodes` instructions (#101493 ) In the note on forming a single cluster we describe what to do if inadvertently forming extra clusters, but we can be more explicit about what to do with `cluster.initial_master_nodes` in these instructions. This commit adds the missing details.	2023-10-30 08:25:40 +00:00
Abdon Pijpelink	af76a3a436	[DOCS] Add 'Troubleshooting an unstable cluster' to nav (#99287 ) * [DOCS] Add 'Troubleshooting an unstable cluster' to nav * Adjust docs links in code * Revert "Adjust docs links in code" This reverts commit `f3846b1d78`. --------- Co-authored-by: David Turner <david.turner@elastic.co>	2023-09-08 13:42:50 +02:00
David Turner	0f6a217ed8	Fix admonition about initial_master_nodes (#98242 ) Admonition paragraphs cannot be combined with a `+` continuation mark. This commit fixes the formatting by using an admonition block instead.	2023-08-08 11:50:36 +01:00
David Turner	09e53f9ad9	Enhance docs around network troubleshooting (#97305 ) Discovery, like cluster membership, can also be affected by network-like issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this commit duplicates the troubleshooting info across both places.	2023-07-10 10:57:44 +01:00
debadair	777598d602	[DOCS] Remove redirect pages (#88738 ) * [DOCS] Remove manual redirects * [DOCS] Removed refs to modules-discovery-hosts-providers * [DOCS] Fixed broken internal refs * Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc. * Update docs/reference/search/point-in-time-api.asciidoc Co-authored-by: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/setup/restart-cluster.asciidoc Co-authored-by: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/sql/endpoints/translate.asciidoc Co-authored-by: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/snapshot-restore/restore-snapshot.asciidoc Co-authored-by: James Rodewig <james.rodewig@elastic.co> * Update repository-azure.asciidoc * Update node-tool.asciidoc * Update repository-azure.asciidoc --------- Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co> Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2023-05-24 12:32:46 +01:00
David Turner	7a517cb4a0	Add note on jstack frequency for troubleshooting (#95764 ) Suggest calling `jstack` every 15s to ensure that at least one capture shows a stuck thread. Also adds a link to this guide to the list on the troubleshooting overview page.	2023-05-03 10:04:13 +01:00
David Turner	f0989404ab	Bootstrapping docs clarifications (#94977 ) Explains why you should remove `cluster.initial_master_nodes`, and rewords some of the other sections a little for (subjectively) improved readability.	2023-04-03 14:43:12 +01:00
David Kilfoyle	7cd484ac95	Revert "Cross-reference disclaimer" (#94829 ) * Revert "Cross-reference disclaimer (#94801)" This reverts commit `902649be31`. * Highlight sentences about removing `cluster.initial_master_nodes` setting	2023-03-28 11:17:53 -04:00
Stef Nestor	902649be31	Cross-reference disclaimer (#94801 ) 👋🏼 howdy, team! Can we cross pollinate the [important banner](https://www.elastic.co/guide/en/elasticsearch/reference/master/important-settings.html#initial_master_nodes) from the `cluster.initial_master_nodes` setting page to the related [bootstrap doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-bootstrap-cluster.html#bootstrap-cluster-name) to avoid user's misunderstanding the latter's "This is only required the first time a cluster starts up" as saying they don't need to comment-out these settings?	2023-03-28 00:16:40 -04:00
David Turner	4c68382065	Capture thread dump on ShardLockObtainFailedException (#93458 ) We sometimes see a `ShardLockObtainFailedException` when a shard failed to shut down as fast as we expected, often because a node left and rejoined the cluster. Sometimes this is because it was held open by ongoing scrolls or PITs, but other times it may be because the shutdown process itself is too slow. With this commit we add the ability to capture and log a thread dump at the time of the failure to give us more information about where the shutdown process might be running slowly. Relates #93226	2023-02-02 11:17:40 -05:00
David Turner	dfab580976	Limit length of lag detector hot threads log lines (#92851 ) If debug logging is enabled then the lag detector will capture and report the hot threads of a lagging node. In some cases the resulting log message can be very large, exceeding 10kiB, which means it is truncated in most logging setups. The relevant thread(s) may be waiting on I/O, which is not considered "hot" and therefore may not appear in the first 10kiB. This commit adjusts this logging mechanism to split the message into chunks of size at most 2kiB (after compression and base64-encoding) to ensure that the entire hot threads output can be faithfully reconstructed from these logs. Closes #88126	2023-01-13 13:11:26 +00:00
David Turner	6203560983	Fix docs for fault detection troubleshooting (#92749 ) In #92742 we changed the logging around cluster membership changes but the docs don't quite match the final version. This commit addresses that.	2023-01-09 10:17:06 +00:00
David Turner	5182748318	Improve node-{join,left} logging for troubleshooting (#92742 ) Today to troubleshoot an unstable cluster we ask the users to parse the rather complex `node-join` and `node-left` messages emitted by the `MasterService`. These messages may refer to many nodes, may be truncated, and are generally pretty hard to work with. With this commit we start to emit a simplified log message about each node added and removed. It also renames the respective executor classes: - `JoinTaskExecutor` -> `NodeJoinExecutor` - `NodeRemovalClusterStateTaskExecutor` -> `NodeLeftExecutor` This brings their names in line with each other, and the messages that they emit, whilst preserving the older `node-join` and `node-left` terminology as reported by the `MasterService`. Finally, it updates the troubleshooting logs to reflect these new and simplified logs. Relates #92741	2023-01-09 04:34:41 -05:00
Luiz Guilherme Pais dos Santos	9eec322424	Fix format for cluster.discovery_configuration_check.interval (#90452 )	2022-12-22 16:11:33 +01:00
David Turner	c9d4892929	Weaken language about "low-latency" networks (#89198 ) Today we say that voting-only nodes require a "low-latency" network. This term has a specific meaning in some operating environments which is different from our intended meaning. To avoid this confusion this commit removes the absolute term "low-latency" in favour of describing the requirements relative to the user's own performance goals.	2022-08-09 13:15:37 +01:00
Leaf-Lin	945cb27782	[DOCS] Adding discovery troubleshooting link in the master get help page (#87344 ) * Adding discovery troubleshooting link * Add tags to pull in discovery troubleshooting content * Move discovery troubleshooting to separate page and add redirects Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-07-06 15:51:43 -04:00
Iraklis Psaroudakis	50d2cf31b8	Periodic warning for 1-node cluster w/ seed hosts (#88013 ) For fully-formed single-node clusters, emit a periodic warning if seed_hosts has been set to a non-empty list. Closes #85222	2022-06-30 16:35:15 +03:00
David Turner	80f7af58f8	More detail in discovery troubleshooting docs (#86930 ) In #85074 we added docs on discovery troubleshooting that really only talked about troubleshooting master elections. There's also the case where the master is elected fine but some other node can't join it. This commit adds troubleshooting docs about that too. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2022-06-06 08:33:45 +01:00
David Turner	79f181d208	Reduce resource needs of join validation (#85380 ) Fixes a few scalability issues around join validation: - compresses the cluster state sent over the wire - shares the serialized cluster state across multiple nodes - forks the decompression/deserialization work off the transport thread Relates #77466 Closes #83204	2022-04-26 12:15:54 +01:00
David Turner	33a553f61f	Fix up whitespace error introduced in #85948	2022-04-19 07:58:10 +01:00
David Turner	ce004d49e7	More docs re. removing cluster.initial_master_nodes (#85948 ) Ensures that on every page of the docs that mentions `cluster.initial_master_nodes` also mentions that this setting must be removed after bootstrapping completes.	2022-04-19 07:54:43 +01:00
David Turner	6a273886e9	Add technical docs on diagnosing instability etc (#85074 ) Copies some internal troubleshooting docs to the reference manual for wider use. Co-authored-by: James Rodewig <james.rodewig@gmail.com>	2022-03-31 09:01:10 +01:00
David Turner	fd76f9c5d1	Fix auto-bootstrap docs (#85215 ) Today it's no longer true that by default nodes will auto-discover other nodes on the same host and bootstrap them all into a cluster. This commit fixes the docs on auto-bootstrapping to recognise this.	2022-03-22 16:35:48 +00:00
Tobias Stadler	e3deacf547	[DOCS] Fix typos (#83895 )	2022-02-15 12:42:17 -05:00
James Rodewig	2f03112b5b	[DOCS] Synced with 8.0 stack upgrade changes (#83489 ) (#83596 ) This moves the bulk of the upgrade information into the consolidated upgrade guide, but leaves the primary upgrade topic in place as a cross reference. Relates to: https://github.com/elastic/stack-docs/pull/1970 Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> (cherry picked from commit `f6473d71f9`) Co-authored-by: debadair <debadair@elastic.co>	2022-02-07 11:01:42 -05:00
David Turner	7dd32fb027	Reduce verbosity-increase timeout to 3m (#81118 ) Today we increase the verbosity of discovery failures after 5 minutes without a master. Unfortunately 5 minutes is a common orchestration timeout, so if discovery is broken then we see nodes being shut down just before they start to emit useful logs. This commit reduces the default timeout to 3 minutes to address that.	2021-11-30 09:52:39 +00:00
David Turner	8cf4c7b6fb	Remove last few mentions of Zen discovery (#80410 ) We have a few leftover mentions of `zen` discovery, mostly for historical/BwC reasons, which this commit removes. Prior to this commit the default value for `discovery.type` was `zen` but this was not written down anywhere or officially supported: the two options were to set it to `single-node` or to omit it entirely. This commit changes the default to `multi-node` and documents this. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-11-09 09:52:06 +01:00
Adam Locke	529986e9b1	A typo error (#78987 ) (#79203 ) * A typo error a space between 'E' and 'cluster...' * Update example, fix headings, change notes Co-authored-by: Adam Locke <adam.locke@elastic.co> Co-authored-by: Marwane Chahoud <marwane.chahoud@gmail.com>	2021-10-15 08:52:03 -04:00
Martijn van Groningen	8a1deff75a	Improve fault-detection.asciidoc (#76821 ) Add section to fault-detection.asciidoc about nodes being removed from cluster due to slow cluster state applying.	2021-08-23 14:31:06 +02:00
David Turner	eabe2d1b34	Increase PeerFinder verbosity on persistent failure (#73128 ) If a node is partitioned away from the rest of the cluster then the `ClusterFormationFailureHelper` periodically reports that it cannot discover the expected collection of nodes, but does not indicate why. To prove it's a connectivity problem, users must today restart the node with `DEBUG` logging on `org.elasticsearch.discovery.PeerFinder` to see further details. With this commit we log messages at `WARN` level if the node remains disconnected for longer than a configurable timeout, which defaults to 5 minutes. Relates #72968	2021-05-17 10:52:18 +01:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
David Turner	2adeb4a666	Expand and consolidate networking docs (#68051 ) Today's network config docs are split into "Network", "HTTP" and "Transport" pages, with unclear relationships between them. We often encounter users with weird configs that indicate they don't really understand how these settings all relate. In fact these pages are all very interrelated, and the HTTP and Transport pages are almost all only for advanced users. This commit brings these docs into a single page and rewords some things to try and guide users away from the advanced settings unless their configuration needs all the extra complexity. It also adds a section entitled "Binding and publishing" which clarifies the meanings of the `bind_host` and `publish_host` parameters. This is also a common source of confusion amongst users. It also clarifies that many of these settings accept a list of addresses, and warns that this may not be what you want. Closes #67956. Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-02-01 13:06:20 +00:00
David Turner	9c100cdeae	Extend default probe connect/handshake timeouts (#68059 ) Today the discovery phase has a short 1-second timeout for handshaking with a remote node after connecting, which allows it to quickly move on and retry in the case of connecting to something that doesn't respond straight away (e.g. it isn't an Elasticsearch node). This short timeout was necessary when the component was first developed because each connection attempt would block a thread. Since #42636 the connection attempt is now nonblocking so we can apply a more relaxed timeout. If transport security is enabled then our handshake timeout applies to the TLS handshake followed by the Elasticsearch handshake. If the TLS handshake alone takes over a second then the whole handshake times out with a `ConnectTransportException`, but this does not tell us which of the two individual handshakes took so long. TLS handshakes have their own 10-second timeout, which if reached yields a `SslHandshakeTimeoutException` that allows us to distinguish a problem at the TLS level from one at the Elasticsearch level. Therefore this commit extends the discovery probe timeouts.	2021-01-27 16:41:44 +00:00
Adam Locke	789ee2d73e	[DOCS] Combining important config settings into a single page (#63849 ) * Combining important config settings into a single page. * Updating ids for two pages causing link errors and implementing redirects.	2020-10-19 10:02:22 -04:00
James Rodewig	dcf0c3062f	[DOCS] Document dynamic discovery settings (#61420 )	2020-09-04 10:56:17 -04:00
Yannick Welsch	0b517ddca6	Provide option to allow writes when master is down (#60605 ) Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the master for consistency). Eventually we should switch the default of cluster.no_master_block to this new mode.	2020-08-12 16:37:32 +02:00

1 2 3

131 commits