elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 10:23:41 -04:00

Author	SHA1	Message	Date
Slobodan Adamović	265c70423b	[DOCS] Add missing ELASTIC_PASSWORD in docker-compose (#112372 ) This PR adds missing ELASTIC_PASSWORD environment variable to es02 and es03 nodes. Resolves https://github.com/elastic/elasticsearch/issues/112235	2024-09-03 09:58:18 +02:00
Luke Whiting	0426e1fbd5	(API) Cluster Health report unassigned_primary_shards (#111727 ) (#112024 ) This PR adds a count of currently unassigned primary shards to both the `/_cat/health` and `/_cluster/health` endpoints. This is to aid cluster administrators in estimating the time remaining for a cluster to go from RED to YELLOW status as per enchancement request #111727. Tests and doc updates are in place with this PR and manual testing with `./gradlew run` has been conducted on the endpoints to ensure correct output. ## Known Limitations * Testing * Due to limitations in the YAML REST test framework skip functionality, YAML REST tests for this endpoint are disabled when running a mixed version cluster by using a cluster version number synthetic feature to skip when any member of the cluster is not at a version greater than when this change is due to be introduced	2024-09-02 20:00:06 +10:00
Liam Thompson	1acba13a44	[DOCS] Update documents and indices overview (#112394 )	2024-09-02 11:22:41 +02:00
Mary Gouseti	91f4023e27	Expose global retention settings via data stream lifecycle API (#112210 ) In this PR we expose the global retention via the `GET _data_stream/{target}/_lifecycle` API. Since the global retention is a main feature of the data stream lifecycle we chose to expose it by default. ``` GET /_data_stream/my-data-stream/_lifecycle { "global_retention": { "default_retention": "7d", "max_retention": "365d" }, "data_streams": [...] } ```	2024-09-02 18:40:08 +10:00
Liam Thompson	3d2ca69b7c	[DOCS] Collapse some content in local dev setup for readability (#112355 ) * [DOCS] Collapse some content in local dev setup for readability * Reword collapsible text	2024-09-02 10:28:35 +02:00
Lee Hinman	4ae88f98dc	Add 'verbose' flag retrieving maximum_timestamp for get data stream API (#112303 ) This commit adds support for the `verbose` querystring parameter to the get data stream API (`GET /_data_stream/{name}`). The flag defaults to "false". When set to true, the `maximum_timestamp` for the data stream will be retrieved and returned for each data stream retrieved. This is the same information available from the data stream stats API (and internally uses the same action to retrieval).	2024-08-31 03:18:15 +10:00
István Zoltán Szabó	adb23531f9	[DOCS] Adds Google Vertex AI tutorial (#112339 ) Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2024-08-30 13:17:59 +02:00
David Kyle	e3e562ffbf	[ML] Support sparse embedding models in the elasticsearch inference service (#112270 ) For a sparse embedding model created with the ml trained models APIs	2024-08-29 17:18:54 +01:00
David Turner	9387ce3357	Deduplicate unstable-cluster troubleshooting docs (#112333 ) We duplicated these docs in order to avoid breaking older links, but this makes it confusing and hard to link to the right copy of the information. This commit removes the duplication by replacing the docs at the old locations with stubs that link to the new locations.	2024-08-29 13:16:37 +01:00
weizijun	35fe3a9c47	some fixed (#112332 )	2024-08-29 13:46:58 +02:00
István Zoltán Szabó	2c29a3ae0a	[DOCS] Highlights auto-chunking in intro of semantic text. (#111836 )	2024-08-29 12:43:10 +02:00
Liam Thompson	aa57a1553e	[DOCS] Rewrite "What is Elasticsearch?" (Part 1) (#112213 )	2024-08-29 10:13:30 +02:00
weizijun	b9dea69b5c	[Inference API] Add Docs for AlibabaCloud AI Search Support for the Inference API (#112273 ) Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2024-08-29 09:17:27 +02:00
David Turner	59a42ed41b	Include network disconnect info in troubleshooting docs (#112323 ) A misplaced `//end::` tag meant that the docs added in #112271 are only included in the page on fault detection and not the equivalent troubleshooting docs. This commit fixes the problem.	2024-08-29 15:03:13 +10:00
David Turner	42d650b9bb	Add docs for troubleshooting network disconnects (#112271 ) Basically the same as for nodes that leave the cluster with reason `disconnected`, except that these disconnects don't involve the master so don't cause any nodes to leave the cluster.	2024-08-28 18:59:11 +10:00
Mary Gouseti	bed6e18fa3	Exclude internal data streams from global retention (#112100 ) With #111972 we enable users to set up global retention for data streams that are managed by the data stream lifecycle. This will allow users of elasticsearch to have a more control over their data retention, and consequently better resource management of their clusters. However, there is a small number of data streams that are necessary for the good operation of elasticsearch and should not follow user defined retention to avoid surprises. For this reason, we put forth the following definition of internal data streams. A data stream is internal if it's either a system index (system flag is true) or if its name starts with a dot. This PR adds the `isInternalDataStream` param in the effective retention calculation making explicit that this is also used to determine the effective retention.	2024-08-28 11:28:35 +03:00
David Turner	f150e2c11d	Add telemetry for repository usage (#112133 ) Adds to the `GET _cluster/stats` endpoint information about the snapshot repositories in use, including their types, whether they are read-only or read-write, and for Azure repositories the kind of credentials in use.	2024-08-27 23:34:02 +10:00
Panagiotis Bailis	7563a724f0	Updating retriever documentation to better explain how filters are applied (#112201 )	2024-08-26 16:15:31 +03:00
Panagiotis Bailis	785fe5384b	Adding support for allow_partial_search_results in PIT (#111516 )	2024-08-26 12:56:08 +03:00
Panos Koutsovasilis	29453cb2ce	fix: support all allowed protocol numbers (#111528 ) * fix(CommunityIdProcessor): support all allowed protocol numbers * fix(CommunityIdProcessor): update documentation	2024-08-26 08:37:40 +03:00
Nik Everett	9d6bef1651	Docs: Scripted metric not available in serverless (#112161 ) This updates the docs to say that scripted metric is not available in serverless.	2024-08-23 15:26:46 -04:00
Liam Thompson	d71654195c	[DOCS] Wrap document/field restriction tip in IMPORTANT block (#112146 )	2024-08-23 18:23:57 +02:00
Mary Gouseti	34a78f3cf3	Add documentation to deprecate the global retention privileges. (#112020 )	2024-08-23 11:49:15 +03:00
Niels Bauman	e0c1ccbc1e	Make enrich cache based on memory usage (#111412 ) The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space. This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.	2024-08-23 09:26:55 +02:00
Parker Timmins	1072f2bbab	Add interval based SLM scheduling (#110847 ) Add the ability to schedule an SLM policies with a time unit interval schedule rather than a cron job schedule. For example, an slm policy can be created with the argument "schedule":"30m". This will create a policy that will run 30 minutes after the policy modification_date. It will then run again every time another 30 minutes has passed. Every time the policy is changed, the next snapshot will be re-scheduled to run one interval after the new modification date.	2024-08-22 21:15:29 -05:00
Mary Gouseti	ed60470518	Display effective retention in the relevant data stream APIs (#112019 )	2024-08-22 17:42:49 +03:00
Stef Nestor	f37440f441	(Doc+) Allocation Explain Examples: THROTTLED, MAX_RETRY (#111558 ) Adds [Allocation Explain examples](https://www.elastic.co/guide/en/elasticsearch/reference/master/cluster-allocation-explain.html#cluster-allocation-explain-api-examples) for `THROTTLED` and `MAX_RETRY`. Also formats sub TOC so that we can after link code message to those docs.	2024-08-22 08:16:36 -06:00
David Turner	615e084617	Add more cross-links about sniff/proxy modes (#112079 ) The info about remote cluster connection modes is a little disjointed. This commit adds some cross-links between the sections to help users find more relevant information.	2024-08-22 14:13:56 +01:00
kosabogi	62305f018b	Updates-warning-about-mounting-snapshots (#112057 ) * Updates-warning-about-mounting-snapshots * Update docs/reference/searchable-snapshots/apis/mount-snapshot.asciidoc Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> --------- Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>	2024-08-22 12:22:32 +02:00
David Turner	f0dbda7529	Expand docs on remote cluster proxying (#112025 ) It's not obvious from the docs that transport connections (including connections to remote clusters) use a custom binary protocol and require a _layer 4_ proxy. This commit clarifies this point.	2024-08-21 22:26:57 +01:00
Liam Thompson	84ddd6c7af	[DOCS] Update `rank_constant` value in retriever example (#112056 )	2024-08-21 15:11:19 +02:00
Kuni Sen	fa7f836916	Update searchable snapshot doc about the timing to notice data loss (#112050 ) <!-- Thank you for your interest in and contributing to Elasticsearch! There are a few simple things to check before submitting your pull request that can help with the review process. You should delete these items from your submission, but they are here to help bring them to your attention. --> - Have you signed the [contributor license agreement](https://www.elastic.co/contributor-agreement)? => yes - Have you followed the [contributor guidelines](https://github.com/elastic/elasticsearch/blob/main/CONTRIBUTING.md)? => yes - If submitting code, have you built your formula locally prior to submission with `gradle check`? => not code - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. => not code - If submitting code, have you checked that your submission is for an [OS and architecture that we support](https://www.elastic.co/support/matrix#show_os)? => not code - If you are submitting this code for a class then read our [policy](https://github.com/elastic/elasticsearch/blob/main/CONTRIBUTING.md#contributing-as-part-of-a-class) for that. => not code ## Description Update searchable snapshot doc about the timing to notice data loss: Sometimes searchable snapshot data is cached onto disk so user may notice their data loss later during node restart (or on Elastic cloud - host maintenance) after they delete their snapshots.	2024-08-21 22:48:06 +10:00
Stef Nestor	f5de9c00c8	(Doc+) "min_primary_shard_size" for 10-50GB shards (#111574 ) 👋🏽 howdy, team! Expands [10-50GB sharding recommendation](https://www.elastic.co/guide/en/elasticsearch/reference/master/size-your-shards.html#shard-size-recommendation) to include ILM's more recent [`min_primary_shard_size`](https://www.elastic.co/guide/en/elasticsearch/reference/master/ilm-rollover.html) option to avoid small shards.	2024-08-21 11:57:09 +02:00
Stef Nestor	c1019d4c5d	(Doc+) Link API doc to parent object - part1 (#111951 ) * (Doc+) Link API to parent Doc part1 --------- Co-authored-by: shainaraskas <shaina.raskas@elastic.co> Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>	2024-08-20 14:58:18 -06:00
Nik Everett	d8e705d5da	ESQL: Document `date` instead of `datetime` (#111985 ) This changes the generated types tables in the docs to say `date` instead of `datetime`. That's the name of the field in Elasticsearch so it's a lot less confusing to call it that. Closes #111650	2024-08-21 01:59:13 +10:00
Stef Nestor	0dab4b0571	(Doc+) Removing "current_node" from Allocation Explain API under Fix Watermark Errors (#111946 ) 👋 howdy, team! This just simplifies the Allocation Explain API request to not need to include the `current_node` which may not be known when troubleshooting the [Fix Watermark Errors](https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-watermark-errors.html) guide. TIA! Stef	2024-08-20 08:22:22 -06:00
Iván Cea Fontenla	65ce50c60a	ESQL: Added mv_percentile function (#111749 ) - Added the `mv_percentile(values, percentile)` function - Used as a surrogate in the `percentile(column, percentile)` aggregation - Updated docs to specify that the surrogate _should_ be implemented if possible The same way as mv_median does, this yields exact results (Ignoring double operations error). For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`) Closes https://github.com/elastic/elasticsearch/issues/111591	2024-08-20 15:29:19 +02:00
Iván Cea Fontenla	e3f378ebd2	ESQL: Strings support for MAX and MIN aggregations (#111544 ) Support Version, Keyword and Text in Max an Min aggregations. The current implementation of both max and min does: For non-grouping: - Store a BytesRef - When there's a max/min, copy it to the internal array. Grow it if needed For grouping: - Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string) - Each BytesRef stores their own array, which will be grown as needed to copy the new max/min Some notes: - It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again - It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker? Part of https://github.com/elastic/elasticsearch/issues/110346	2024-08-20 15:24:55 +02:00
Bogdan Pintea	dd49c33479	ESQL: BUCKET: allow numerical spans as whole numbers (#111874 ) This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves #109340, resolves #104646, resolves #105375.	2024-08-20 13:40:59 +02:00
Mary Gouseti	ad90d1f0f6	Introduce global retention in data stream lifecycle (cluster settings) (#111972 ) In this PR we introduce cluster settings to manage the global data stream retention. We introduce two settings `data_streams.lifecycle.retention.max` & `data_streams.lifecycle.retention.default` that configure the respective retentions. The settings are loaded and monitored by the `DataStreamGlobalRetentionSettings`. The validation has also moved there. We preserved the `DataStreamGlobalRetention` record to reduce the impact of this change. The purpose of this method is to be simply a wrapper record that groups the retention settings together. Temporarily, the `DataStreamGlobalRetentionSettings` is using the DataStreamFactoryRetention which is marked as deprecated for migration purposes.	2024-08-20 09:54:55 +03:00
David Turner	fa58a9d08d	Add known issue docs for #111854 (#111978 )	2024-08-20 07:25:55 +01:00
Nik Everett	dc24003540	ESQL: Profile more timing information (#111855 ) This profiles additional timing information for each individual driver. To the results from `profile` it adds the start and stop time for each driver. That was already in the task status. To the profile and task status it also adds the number of times the driver slept and some more detailed history about a few of those times. Explanation time! The compute engine splits work into some number of `Drivers` per node. Each `Driver` is a single threaded entity - it runs on a thread for a while then does one of three things: 1. Finishes 2. Goes async because one of it's `Operator`s has gone async 3. Yields the thread pool because it has run for too long This PR measures the second two. At this point only three operators can go async: * ENRICH * Reading from an empty exchange * Writing to a full exchange We're quite interested the these sleeps at the moment because they think they may be slowing things down. Here's what it looks like when a driver goes async because it wants to read from an empty exchange: ``` ... the rest of the profile ... "sleeps" : { "counts" : { "exchange empty" : 2 }, "first" : [ { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:57.943Z", "sleep_millis" : 1723578357943, "wake" : "2024-08-13T19:45:58.159Z", "wake_millis" : 1723578358159 }, { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:58.164Z", "sleep_millis" : 1723578358164, "wake" : "2024-08-13T19:45:58.165Z", "wake_millis" : 1723578358165 } ], "last": [same as above] ``` Every time the driver goes async we count it in the `counts` map - grouped by the reason the driver slept. We also record the sleep and wake times for the first and last ten times the driver sleeps. In this case it only slept twice, so the `first` and `last` ten times is the same array. This should give us a good sense about why drivers sleep while using a limited amount of memory per driver.	2024-08-20 07:29:01 +10:00
David Turner	e6b830e3b3	Clean up dangling S3 multipart uploads (#111955 ) If Elasticsearch fails part-way through a multipart upload to S3 it will generally try and abort the upload, but it's possible that the abort attempt also fails. In this case the upload becomes _dangling_. Dangling uploads consume storage space, and therefore cost money, until they are eventually aborted. Earlier versions of Elasticsearch require users to check for dangling multipart uploads, and to manually abort any that they find. This commit introduces a cleanup process which aborts all dangling uploads on each snapshot delete instead. Closes #44971 Closes #101169	2024-08-20 02:49:48 +10:00
David Turner	69f454370a	Fix known issue docs for #111866 (#111956 ) The `known-issue-8.15.0` anchor appears twice which breaks the docs build. Also the existing message suggests incorrectly that `bootstrap.memory_lock: true` is recommended.	2024-08-19 17:26:16 +10:00
David Turner	1222496cd0	Improve reaction to blob store corruptions (#111954 ) Today there are a couple of assertions that can trip if the contents of a snapshot repostiory are corrupted. It makes sense to assert the integrity of snapshots in most tests, but we must also (a) protect against these corruptions in production and (b) allow some tests to verify the behaviour of the system when the repository is corrupted. This commit introduces a flag to disable certain assertions, converts the relevant assertions into production failures too, and introduces a high-level test to verify that we do detect all relevant corruptions without tripping any other assertions. Extracted from #93735 as this change makes sense in its own right. Relates #52622.	2024-08-19 06:35:52 +01:00
David Turner	a406333f87	Revert "Add 8.15.0 known issue for memory locking in Windows (#111949 )" This reverts commit `1e40fe45d6`.	2024-08-19 06:20:28 +01:00
Pius	1e40fe45d6	Add 8.15.0 known issue for memory locking in Windows (#111949 )	2024-08-17 05:53:36 -07:00
kosabogi	e7518fbe93	Adds a warning about manually mounting snapshots managed by ILM (#111883 ) * Adds a warning about manually mounting snapshots managed by ILM * Shortens text and moves the warning to Searchable snapshots chapter	2024-08-15 18:43:06 +02:00
István Zoltán Szabó	1ba72e4602	[DOCS] Documents output_field behavior after multiple inference runs (#111875 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2024-08-15 12:36:59 +02:00
Luca Belluccini	595628f9ce	[DOCS] The logs index.mode has been renamed logsdb (#111871 )	2024-08-14 08:30:31 -07:00

... 13 14 15 16 17 ...

12605 commits