elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-22 06:07:55 -04:00

Author	SHA1	Message	Date
Luca Cavanna	31d9ee3cda	Fix testSearchConcurrencyDoesNotCreateMoreTasksThanThreads failure (#116204 ) This test was somehow difficult to write in the first place. We had to come up with a threshold of how many tasks max are going to be created, but that is not that easy to calculate as it depends on how quickly such tasks can be created and be executed. We should have rather used a higher threshold to start with, the important part is anyways that we create a total of tasks that is no longer dependent on the number of segments, given there are much less threads available to execute them. Closes #116048	2024-11-05 17:11:12 +01:00
Alexey Ivanov	4764c35449	Backport of [CI] JvmStatsTests testJvmStats failing (115711) (#116252 ) Fix and unmute test JvmStatsTests.testJvmStats	2024-11-05 15:22:40 +00:00
Luca Cavanna	942d0e506f	[TEST] Adjust PartialHitCountCollectorTests#testCollectedHitCount (#116181 ) (#116243 ) Split the test in two, one to verify behaviour with threashold greather than 1. Wrote a specific test for the edge case of threshold set to 1. Added a comment that explains the nuance around the behaviour and what influences it. Closes #106647	2024-11-05 23:29:21 +11:00
Felix Barnsteiner	633e7d831e	Ignore conflicting fields during dynamic mapping update (#114227 ) (#116194 ) This fixes a bug when concurrently executing index requests that have different types for the same field. (cherry picked from commit `9658940a51`) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-11-05 23:06:39 +11:00
Kostas Krikellas	01d92b5f31	Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored (#116065 ) (#116225 ) * Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored * Update TransportResumeFollowActionTests.java * rest compat fixes * rest compat fixes * update test (cherry picked from commit `6cf45366d5`) # Conflicts: # rest-api-spec/build.gradle # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/21_synthetic_source_stored.yml # server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java	2024-11-05 19:58:25 +11:00
Lorenzo Dematté	16a1277451	Minor fixes to improve SystemIndexDescriptor (#113278 ) (#116206 ) Closes https://github.com/elastic/elasticsearch/issues/112946	2024-11-05 07:27:57 +11:00
Kostas Krikellas	2439869034	[8.x] [TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072 ) (#116161 ) * [TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072) * Reapply "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069) This reverts commit `e8bf344a28`. * [TEST] Replace _source.mode with index.mapping.source.mode in integration tests * add reason * add reason * spotless * revert unneeded (cherry picked from commit `4573ab8ec1`) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java * Update MapperFeatures.java	2024-11-04 19:45:47 +11:00
Parker Timmins	6ebee669c1	[8.x] Resolve pipelines from template if lazy rollover write (#116031 ) (#116132 ) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests block merge * Remute tests block merge	2024-11-03 04:25:33 +11:00
Brendan Cully	11e0d46e0a	Store: support empty stores in cleanupAndVerify (#116059 ) (#116117 ) Until now if `store.cleanupAndVerify` was called on a store with no commits, it would throw `IndexNotFoundException`. Based on variable naming (`metadataOrEmpty`), this appears to be unintentional, though the issue has been present since the `cleanupAndVerify` method was introduced. This change is motivated by #104473 - I would like to be able to use this method to clean up a store prior to recovery regardless of how far along a previous recovery attempt got.	2024-11-02 06:54:31 +11:00
Nhat Nguyen	f63cf672fd	Add logsdb telemetry (#115994 ) (#116101 ) This PR adds telemetry for logsdb. However, this change only tracks the count of indices using logsdb and those that use synthetic source. Additional stats, such as shard, indexing, and search stats, will be added in a follow-up, as they require reaching out to data nodes.	2024-11-02 03:32:12 +11:00
Andrei Dan	d0eb5a00bb	[8.x] Enable _tier based coordinator rewrites for all indices (not just mounted indices) (#115797 ) (#116076 ) * Enable _tier based coordinator rewrites for all indices (not just mounted indices) (#115797) As part of https://github.com/elastic/elasticsearch/pull/114990 we enabled using the `_tier` field as part of the coordinator rewrite in order to skip shards that do not match a `_tier` filter, but only for fully/partially mounted indices. This PR enhances the previous work by allowing a coordinator rewrite to skip shards that will not match the `_tier` query for all indices (irrespective of their lifecycle state i.e. hot and warm indices can now skip shards based on the `_tier` query) Note however, that hot/warm indices will not automatically take advantage of the `can_match` coordinator rewrite (like read only indices do) but only the search requests that surpass the `pre_filter_shard_size` threshold will. Relates to [#114910](https://github.com/elastic/elasticsearch/issues/114910) (cherry picked from commit `71dfb0689b`) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> * Fix test compilation --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-11-02 02:46:09 +11:00
Aurélien FOUCRET	48d40dc5c7	[8.x] [KQL query] Implement the KQL AST builder (#115084 ) (#116079 ) * [KQL query] Implement the KQL AST builder (#115084) * Replace usage of `getFirst()` by `get(0)` in List	2024-11-02 02:18:13 +11:00
Ievgen Degtiarenko	504b29a99a	Prevent multiple sets copies while adding index aliases (#115934 ) (#116068 )	2024-11-01 20:52:12 +11:00
Salvatore Campagna	0ad1a4a631	Default LogsDB value for `ignore_dynamic_beyond_limit` (#115921 ) When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded: 1. `index.mapping.total_fields.limit`: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail. 2. `index.mapping.total_fields.ignore_dynamic_beyond_limit`: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit. To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of `index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure. This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.	2024-10-31 15:54:54 +01:00
Kostas Krikellas	af0279538a	Check index setting for source mode in SourceOnlySnapshotRepository (#116002 ) (#116014 ) * Check index setting for source mode in SourceOnlySnapshotRepository * update * Revert "update" This reverts commit `9bbf0490f7`. (cherry picked from commit `37a4ee3102`)	2024-11-01 01:36:27 +11:00
Ryan Ernst	dedf9fd6d7	Use directory name as project name for libs (#115720 ) (#115984 ) * Use directory name as project name for libs (#115720) The libs projects are configured to all begin with `elasticsearch-`. While this is desireable for the artifacts to contain this consistent prefix, it means the project names don't match up with their directories. Additionally, it creates complexities for subproject naming that must be manually adjusted. This commit adjusts the project names for those under libs to be their directory names. The resulting artifacts for these libs are kept the same, all beginning with `elasticsearch-`. * fixes	2024-10-31 07:52:10 +11:00
Ying Mao	1f28ffe00b	[Inference API] Add API to get configuration of inference services (#114862 ) (#115979 ) * Adding API to get list of service configurations * Update docs/changelog/114862.yaml * Fixing some configurations * PR feedback -> Stream.of * PR feedback -> singleton * Renaming ServiceConfiguration to SettingsConfiguration. Adding TaskSettingsConfiguration * Adding task type settings configuration to response * PR feedback (cherry picked from commit `4ecdfbb214`) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchService.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchServiceTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiServiceTests.java	2024-10-30 15:26:24 -04:00
Luca Cavanna	c20d2fce5c	[8.x] Limit the number of tasks that a single search can submit (#115932 ) Since we removed the search workers thread pool with #111099, we execute many more tasks in the search thread pool, given that each shard search request parallelizes across slices or even segments (knn query rewrite. There are also rare situations where segment level tasks may parallelize further (e.g. createWeight), that cause the creation of many many tasks for a single top-level request. These are rather small tasks that previously queued up in the unbounded search workers queue. With recent improvements in Lucene, these tasks queue up in the search queue, yet they get executed by the caller thread while they are still in the queue, and remain in the queue as no-op until they are pulled out of the queue. We have protection against rejections based on turning off search concurrency when we have more than maxPoolSize items in the queue, yet that is not enough if enough parallel requests see an empty queue and manage to submit enough tasks to fill the queue at once. That will cause rejections for top-level searches that should not be rejected. This commit introduces wrapping for the executor to limit the number of tasks that a single search instance can submit to the executor, to prevent the situation where a single search submits way more tasks than threads available. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2024-10-30 19:58:54 +01:00
Nhat Nguyen	37b286d2a1	Prohibit changes to index mode, source, and sort settings during restore (#115811 ) (#115973 ) The index.mode, source.mode, and index.sort.* settings cannot be modified during restore, as this may lead to data corruption or issues retrieving _source. This change enforces a restriction on modifying these settings during restore. While a fine-grained check could permit equivalent settings, it seems simpler and safer to reject restore requests if any of these settings are specified.	2024-10-31 05:47:28 +11:00
Nhat Nguyen	6baa7c3132	Prohibit changes to index mode, source, and sort settings during resize (#115812 ) (#115971 ) Relates to #115811, but applies to resize requests. The index.mode, source.mode, and index.sort.* settings cannot be modified during resize, as this may lead to data corruption or issues retrieving _source. This change enforces a restriction on modifying these settings during resize. While a fine-grained check could allow equivalent settings, it seems simpler and safer to reject resize requests if any of these settings are specified.	2024-10-31 05:46:53 +11:00
Nhat Nguyen	00d8fd774a	Remove old synthetic source mapping config (#115889 ) (#115961 ) This change replaces the old synthetic source config in mappings with the newly introduced index setting. Closes #115859	2024-10-31 04:35:12 +11:00
Simon Cooper	a5406ce380	Fix NodeStatsTests chunking (#115929 ) (#115955 ) Rewrite the test to make it a bit clearer	2024-10-30 17:01:43 +00:00
Tim Brooks	9f8cf46bec	Ensure thread context set for streaming (#115887 ) Currently the thread context is lost between streaming context switches. This commit ensures that each time the thread context is properly set before providing new data to the stream.	2024-10-30 08:23:09 -06:00
Simon Cooper	2d538c7022	Backport transport changes from #114895 to 8.x (#115909 )	2024-10-30 14:17:17 +00:00
Chris Hegarty	8f460fb34b	Fix and unmute org.elasticsearch.script.StatsSummaryTests:testEqualsAndHashCode (#115922 ) This commit fixes and unmutes org.elasticsearch.script.StatsSummaryTests:testEqualsAndHashCode. Previously, there was no guarantee that the doubles added to stats1 and stats2 will be different. In fact, the count may even be zero - which we seen in one particular failure. The simplest thing here, to avoid this potential situation, is to ensure that there is at least one value, and that the values added to each stats instance are different.	2024-10-30 11:39:50 +00:00
Nikolaj Volgushev	d359e9ae59	[8.x] Add ECK Role Mapping Cleanup (115823) (#115873 ) * Backport * Version fix * Another * Fix * Fix again * Skip * One more * Formatting fix --------- Co-authored-by: Johannes Fredén <109296772+jfreden@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-30 22:06:37 +11:00
Lorenzo Dematté	d5d63aa6df	112274 converted cpu stats to support unsigned 64 bit number (#114681 ) (#115915 ) (cherry picked from commit `78ccd2a4a2`) Co-authored-by: Souradip Poddar <49103513+SouradipPoddar@users.noreply.github.com>	2024-10-30 21:33:26 +11:00
Kostas Krikellas	2aa7a0f5b6	[8.x] Use flattened names in ignored source (#115822 ) (#115898 ) * Use flattened names in ignored source (#115822) * Use flattened names in ignored source * spotless * fix rest compat * fix unittests * expand dots (cherry picked from commit `06eb0727c2`) # Conflicts: # rest-api-spec/build.gradle * Update 20_synthetic_source.yml * Update 21_synthetic_source_stored.yml	2024-10-30 19:30:05 +11:00
Michael Peterson	4152080b9a	ES\|QL CCS uses skip_unavailable setting for handling disconnected remote clusters (#115266 ) (#115881 ) As part of ES\|QL planning of a cross-cluster search, a field-caps call is done to each cluster and, if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a remote cluster cannot be connected to in these calls, the outcome depends on the skip_unavailable setting. For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated back to the client with a top level error message with a 500 HTTP status response code. For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested CCS metadata to be included, the cluster status and connection failure will be present in the _clusters/details section of the response. If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be returned. Instead a 200 HTTP status will be returned with no column and no values. If the include_ccs_metadata: true setting was included on the query, the errors will listed in the _clusters metadata section. (Note: this is also how the _search endpoint works for CCS.) Partially addresses https://github.com/elastic/elasticsearch/issues/114531	2024-10-30 08:29:16 +11:00
Mikhail Berezovskiy	f162d345d9	Close channel on stream handler exception (#115505 ) (#115879 ) In case a stream handler throws uncaught exception, we should close the channel and release associated resources to avoid the channel entering a limbo state. This PR does that. Resolves: ES-9537 Co-authored-by: Yang Wang <yang.wang@elastic.co>	2024-10-29 14:21:34 -07:00
Kathleen DeRusso	354b5769b6	Query rules retriever (#114855 ) (#115870 ) (cherry picked from commit `690ad1ea60`)	2024-10-29 21:14:21 +01:00
David Turner	b88e9a6947	Add link to MAX_RETRY allocation explain docs (#115099 ) Backport of #113657 to `8.x` Co-authored-by: matthewabbott <ttobbatam@gmail.com>	2024-10-29 01:45:01 +11:00
Simon Cooper	20470d0d62	Make some chunked xcontent more efficient (#115512 ) (#115736 )	2024-10-28 21:50:48 +11:00
Ryan Ernst	5cccabeb87	Improve error message for unparseable numeric settings (#115609 ) (#115718 ) When a numeric setting is too large or too small such that it can't be parsed at all, the error message is the same as for garbage values. This commit improves the error message in these cases to be the same as for normal bounds checks. closes #115080	2024-10-27 02:10:18 +11:00
Craig Taverner	7dc6603fb7	Slightly more generous assertions for Cartesian tests (#115658 ) (#115670 )	2024-10-26 02:54:32 +11:00
Christoph Büscher	043f45affc	Fix failing SearchQueryThenFetchAsyncActionTests (#114630 ) The version randomization has been changed recently with the unintended effect that now randomized "old" and "new" versions can be the same, and new versions can even be lower than old versions. This change corrects this by going back to the previous version randomization logic. Closes #114593	2024-10-25 15:49:00 +02:00
Carlos Delgado	e578533381	Identify system threads using a Thread subclass (#113562 ) (#115618 ) (cherry picked from commit `bbd887a66a`)	2024-10-25 15:29:33 +02:00
Pawan Kartik	605176067c	Correctly update search status for a nonexistent local index (#115138 ) (#115611 ) * fix: correctly update search status for a nonexistent local index * Check for cluster existence before updation * Remove unnecessary `println` * Address review comment: add an explanatory code comment * Further clarify code comment (cherry picked from commit `ad9c5a0a06`)	2024-10-25 13:50:09 +01:00
Armin Braun	3cc5796c18	Optimize threading in AbstractSearchAsyncAction (#113230 ) (#115643 ) Forking when an action completes on the current thread is needlessly heavy handed in preventing stack-overflows. Also, we don't need locking/synchronization to deal with a worker-count + queue length problem. Both of these allow for non-trivial optimization even in the current execution model, also this change helps with moving to a more efficient execution model by saving needless forking to the search pool in particular. -> refactored the code to never fork but instead avoid stack-depth issues through use of a `SubscribableListener` -> replaced our home brew queue and semaphore combination by JDK primitives which saves blocking synchronization on task start and completion.	2024-10-25 22:40:30 +11:00
Armin Braun	b22b9c7b83	Cleanup HotThreadsIT (example of test cleanup) (#115601 ) (#115645 ) Just a quick example of how to save quite a few lines of code and make a test easier to reason about.	2024-10-25 22:35:18 +11:00
Armin Braun	4782ee552d	Lazy initialize HttpRouteStatsTracker in MethodHandlers (#114107 ) (#115620 ) We use about 1M for the route stats trackers instances per ES instance. Making this lazy init should come at a trivial overhead and in fact makes the computation of the node stats cheaper by saving spurious sums on 0-valued long adders.	2024-10-25 18:54:54 +11:00
Nhat Nguyen	6c884e7477	[8.x] Add lookup index mode (#115143 ) (#115596 ) * Add lookup index mode (#115143) This change introduces a new index mode, lookup, for indices intended for lookup operations in ES\|QL. Lookup indices must have a single shard and be replicated to all data nodes by default. Aside from these requirements, they function as standard indices. Documentation will be added later when the lookup operator in ES\|QL is implemented. * default shard * minimal * compile	2024-10-25 12:43:42 +11:00
Armin Braun	04572bbf84	Replace IndexNameExpressionResolver.ExpressionList with imperative logic (#115487 ) (#115602 ) The approach taken by `ExpressionList` becomes very expensive for large numbers of indices/datastreams. It implies that large lists of concrete names (as they are passed down from the transport layer via e.g. security) are copied at least twice during iteration. Removing the intermediary list and inlining the logic brings down the latency of searches targetting many shards/indices at once and allows for subsequent optimizations. The removed tests appear redundant as they tested an implementation detail of the IndexNameExpressionResolver which itself is well covered by its own tests.	2024-10-25 09:20:24 +11:00
Ryan Ernst	00a70699fa	Guard blob store local directory creation with doPrivileged (#115459 ) (#115571 ) The blob store may be triggered to create a local directory while in a reduced privilege context. This commit guards the creation of directories with doPrivileged.	2024-10-25 03:54:00 +11:00
Andrei Dan	8a6d6927a2	[8.x] Allow for queries on _tier to skip shards during coordinator rewrite (#114990 ) (#115514 ) * Allow for queries on _tier to skip shards during coordinator rewrite (#114990) The `_tier` metadata field was not used on the coordinator when rewriting queries in order to exclude shards that don't match. This lead to queries in the following form to continue to report failures even though the only unavailable shards were in the tier that was excluded from search (frozen tier in this example): ``` POST testing/_search { "query": { "bool": { "must_not": [ { "term": { "_tier": "data_frozen" } } ] } } } ``` This PR addresses this by having the queries that can execute on `_tier` (term, match, query string, simple query string, prefix, wildcard) execute a coordinator rewrite to exclude the indices that don't match the `_tier` query before attempting to reach to the shards (shards, that might not be available and raise errors). Fixes #114910 * Don't use getFirst * test compilation --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-25 03:04:44 +11:00
Alexey Ivanov	e32408d9c5	Report JVM stats for all memory pools (97046) (#115117 ) (#115550 ) This fix allows reporting of all JVM memory pools sizes in JVM stats	2024-10-25 01:38:40 +11:00
Mary Gouseti	58ff29e656	[Failure store - selector syntax] Introduce the `::` selector (#115389 ) (#115507 ) Introduction* > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see https://github.com/elastic/elasticsearch/pull/113144. All this work has been cherry picked from there. Purpose of this PR This PR is introducing the `::` as another selector option and not as a combination of `::data` and `::failure`. The reason for this change is that we need to differentiate between: - `my-index::` which should resolve to `my-index::data` only and not to `my-index::failures` and - a user explicitly requesting `my-index::data, my-index::failures` which should result potentially to an error.	2024-10-24 21:03:15 +11:00
Nikolaj Volgushev	4a200672af	Fix file settings service test on windows (#115234 ) (#115502 ) Backports #115234	2024-10-24 20:44:04 +11:00
Oleksandr Kolomiiets	86bcb99246	[8.x] Apply workaround for synthetic source of object arrays inside nested objects (#115275 ) (#115467 ) * Apply workaround for synthetic source of object arrays inside nested objects (#115275) (cherry picked from commit `f04bf5c356`) # Conflicts: # rest-api-spec/build.gradle # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/21_synthetic_source_stored.yml * Fix merge	2024-10-23 17:23:48 -07:00
Ryan Ernst	f39aba4016	Remove LongGCDisruption scheme (#115046 ) (#115443 ) Long GC disruption relies on Thread.resume, which is removed in JDK 23. Tests that use it predate more modern disruption tests. This commit removes gc disruption and the master disruption tests. Note that tests relying on this scheme have already not been running since JDK 20 first deprecated Thread.resume.	2024-10-24 07:16:03 +11:00

... 2 3 4 5 6 ...

14959 commits