elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-22 06:07:55 -04:00

Author	SHA1	Message	Date
Oleksandr Kolomiiets	8ea52ffd91	`ignore_above` default to 8191 for `logsdb` (#113442 ) (#115373 ) In LogsDB we would like to use a default value of `8191` for the index-level setting `index.mapping.ignore_above`. The value for `ignore_above` is the _character count_, but Lucene counts bytes. Here we set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most 4 bytes. (cherry picked from commit `521e4341d7`) # Conflicts: # server/src/main/java/org/elasticsearch/common/settings/Setting.java Co-authored-by: Salvatore Campagna <93581129+salvatore-campagna@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-23 12:44:34 -07:00
David Kyle	aa558c8b16	[ML] Add pathc transport version for change to Get Inference Request (#115250 ) (#115446 )	2024-10-24 05:58:34 +11:00
Panagiotis Bailis	65137030fd	Updating error handling for compound retrievers (#115277 ) (#115428 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-24 05:09:48 +11:00
Joe Gallo	552a4381e8	Refactor PipelineConfiguration#getVersion (#115423 ) (#115433 )	2024-10-24 03:49:09 +11:00
Martijn van Groningen	3814cf4429	Sometimes delegate to SourceLoader in ValueSourceReaderOperator for required stored fields (#115114 ) (#115390 ) If source is required by a block loader then the StoredFieldsSpec that gets populated should be enhanced by SourceLoader#requiredStoredFields(...) in ValuesSourceReaderOperator. Otherwise in case of synthetic source many stored fields aren't loaded, which causes only a subset of _source to be synthesized. For example when unmapped fields exist or field values that exceed configured ignore above will not appear is _source. This happens when field types fallback to a block loader implementation that uses _source. The required field values are then extracted from the source once loaded. This change also reverts the production code changes introduced via #114903. That change only ensured that _ignored_source field was added to the required list of stored fields. In reality more fields could be required. This change is better fix, since it handles also other cases and the SourceLoader implementation indicates which stored fields are needed. Closes #115076	2024-10-23 21:06:58 +11:00
Joe Gallo	d6fe260728	Optimize IndexTemplateRegistry#clusterChanged (#115347 ) (#115366 )	2024-10-23 12:03:38 +11:00
Lee Hinman	10f711d209	Revert "Add ResolvedExpression wrapper (#114592 )" (#115317 ) (#115376 ) This reverts commit `4c15cc0778`. This commit introduced an orders of magnitude regression when searching many shards. (cherry picked from commit `d9baf6f9db`) Co-authored-by: Armin Braun <me@obrown.io>	2024-10-23 11:22:10 +11:00
Keith Massey	9cf174f629	Adding support for simulate ingest mapping adddition for indices with mappings that do not come from templates (#115359 ) (#115369 )	2024-10-23 09:00:31 +11:00
Joe Gallo	8f568b5649	Optimize downloader task executor (#115355 ) (#115365 )	2024-10-23 08:36:13 +11:00
Joe Gallo	66f285e272	Optimize IngestService#resolvePipelinesFromIndexTemplates (#115348 ) (#115367 )	2024-10-23 08:17:32 +11:00
elasticsearchmachine	7976313877	Bump versions after 7.17.25 release	2024-10-22 20:43:46 +00:00
Kostas Krikellas	f9a1561c88	Handle setting merge conflicts for overruling settings providers (#115217 ) (#115325 ) * Handle setting merge conflicts for overruling settings providers * spotless * update TransportSimulateIndexTemplateAction * update comment and add test * fix flakiness * fix flakiness (cherry picked from commit `e3c198a23a`)	2024-10-22 20:06:53 +03:00
Panagiotis Bailis	c210cd477b	Adding validation for incompatibility of compound retrievers and scroll (#115106 ) (#115323 )	2024-10-23 01:37:51 +11:00
Keith Massey	35f7efefd1	Adding support for additional mapping to simulate ingest API (#114742 ) (#115284 )	2024-10-22 08:13:33 -05:00
Simon Cooper	a7c7004c28	Remove ChunkedToXContentHelper.array method, swap for ChunkedToXContentBuilder (#114319 ) (#115252 ) Backport #114319 to 8.17	2024-10-22 11:11:16 +01:00
Jim Ferenczi	467e564d4c	[8.x] Add prefilters only once in the compound and text similarity retrievers (#115296 ) * Add prefilters only once in the compound and text similarity retrievers (#114983) This change ensures that the prefilters are propagated in the downstream retrievers only once. It also removes the ability to extends `explainQuery` in the compound retriever. This is not needed as the rank docs are now responsible for the explanation. * Trigger Build	2024-10-22 20:03:13 +11:00
Mary Gouseti	21312daadf	Adjust failure store to work with TSDS (#114307 ) (#115294 ) In this PR we add a test and we fix the issues we encountered when we enabled the failure store for TSDS and logsdb. Logsdb Logsdb worked out of the box, so we just added the test that indexes with a bulk request a couple of documents and tests how they are ingested. TSDS Here it was a bit trickier. We encountered the following issues: - TSDS requires a timestamp to determine the write index of the data stream meaning the failure happens earlier than we have anticipated so far. We added a special exception to detect this case and we treat it accordingly. - The template of a TSDS data stream sets certain settings that we do not want to have in the failure store index. We added an allowlist that gets applied before we add the necessary index settings. Furthermore, we added a test case to capture this.	2024-10-22 19:45:24 +11:00
Ignacio Vera	2203e4c538	Grow internal arrays when growing the capacity in AbstractHash implementations (#114907 ) (#115289 ) This commit resizes those arrays when incrementing the capacity of the hashes to the maxSize.	2024-10-22 16:40:47 +11:00
Ignacio Vera	2ad380743a	Always check the parent breaker with zero bytes in PreallocatedCircuitBreakerService (#115181 ) (#115274 ) PreallocatedCircuitBreakerService will call the parent breaker if the nunber of bytes passed is zero.	2024-10-22 06:46:59 +11:00
Carlos Delgado	b73f5c55df	Do not exclude empty arrays or empty objects in source filtering with Jackson streaming (#112250 ) (#115222 ) (cherry picked from commit `6be3036c01`) Co-authored-by: mccheah <mcheah@palantir.com>	2024-10-21 16:29:02 +02:00
Nikolaj Volgushev	dd50942dc8	[8.x] Reprocess operator file settings on service start (#114295 ) (#115198 ) * Reprocess operator file settings on service start (#114295) Changes `FileSettingsService` to reprocess file settings on every restart or master node change, even if versions match between file and cluster-state metadata. If the file version is lower than the metadata version, processing is still skipped to avoid applying stale settings. This makes it easier for consumers of file settings to change their behavior w.r.t. file settings contents. For instance, an update of how role mappings are stored will automatically apply on the next restart, without the need to manually increment the file settings version to force reprocessing. Relates: ES-9628 * Backport 114295 --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-22 00:26:00 +11:00
Matteo Piergiovanni	b80f6770f5	Bool query early termination should also consider must_not clauses (#115031 ) (#115072 ) * Bool query early termination should also consider must_not clauses * Update docs/changelog/115031.yaml (cherry picked from commit `5e381a3a89`) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-21 10:44:14 +02:00
Nhat Nguyen	fdbef48ac0	Refactor TSDB doc_values util allow introduce new codec (#115042 ) (#115159 ) This PR refactors the doc_values utils used in the TSDB codec to allow sharing between the current codec and the new codec.	2024-10-19 11:12:39 +11:00
Oleksandr Kolomiiets	682ed39b21	Remove IndexMode#isSyntheticSourceEnabled (#114963 ) (#115144 ) (cherry picked from commit `16bde51891`) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java	2024-10-18 15:44:40 -07:00
Benjamin Trent	e7ac7c3c96	Add timeout and cancellation check to rescore phase (#115048 ) (#115132 ) This adds cancellation checks to rescore phase. This cancellation checks for the parent task being cancelled and for timeout checks. The assumption is that rescore is always significantly more expensive than a regular query, so we check for timeout as frequently as the most frequent check in ExitableDirectoryReader. For LTR, we check on hit inference. Maybe we should also check for per feature extraction? For QueryRescorer, we check in the combine method. closes: https://github.com/elastic/elasticsearch/issues/114955	2024-10-19 05:58:04 +11:00
Martijn van Groningen	d9c930d924	Include ignored source as part of loading field values in ValueSourceReaderOperator via BlockSourceReader. (#114903 ) (#115064 ) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.	2024-10-18 17:58:42 +11:00
Martijn van Groningen	33b96cdc81	Reduce the number of SFM singletons. (#114969 ) (#115036 ) This remove all recovery source specific SFM singletons. Whether recovery source is enabled can be checked via `DocumentParserContext`. This reduces the number of SFM instances by half.	2024-10-18 04:27:33 +11:00
Alexander Spies	068f51d76a	Reapply "ESQL: Remove parent from FieldAttribute (#112881 )" (#115006 ) (#115007 ) (#115035 ) This reverts commit `17ecb66a06` and reapplies https://github.com/elastic/elasticsearch/pull/112881 once the previous, non-backported transport version bump is dealt with.	2024-10-18 03:46:46 +11:00
Ignacio Vera	918e8c48ed	Don't normalize coordinates in GeoTileUtils (#114929 ) (#115030 ) The main users of this class use as input latitudes and longitudes read from doc values. These coordinates are always on bounds so there is no point to try to normalise them, more over when this piece of code is in the hot path for aggregations.	2024-10-18 03:11:16 +11:00
elasticsearchmachine	d55ddee129	Bump versions after 8.15.3 release	2024-10-17 14:52:28 +00:00
Simon Cooper	274966443b	Add transport version constants used in main, to ensure the protocol is identical (#115013 ) This adds constants introduced by #115009 on main, so that the transport version id matches between main and 8.x to make backports apply easier	2024-10-17 15:24:44 +01:00
Simon Cooper	a7883aea9b	Squash transport versions for 8.15 (#114827 ) (#114971 ) Backport #114827 to 8.x	2024-10-17 11:48:44 +01:00
Nhat Nguyen	4b690386e4	Collect query metrics on search nodes (#114267 ) (#114345 ) When I added the query/fetch metrics, I overlooked that non-primary shards were being skipped during metrics collection, and the stateful tests didn't catch it. This change ensures that search metrics are now collected from every shard copy.	2024-10-17 12:25:25 +11:00
Panagiotis Bailis	49f359a598	[8.x] Adding deprecation warnings for rank and sub_searches (#114854 ) (#114950 ) * Adding deprecation warnings for rank and sub_searches (#114854) * removing updatev10 reference	2024-10-17 07:58:07 +11:00
Tim Brooks	9922d544a1	Standardize error code when bulk body is invalid (#114869 ) (#114944 ) Currently the incremental and non-incremental bulk variations will return different error codes when the json body provided is invalid. This commit ensures both version return status code 400. Additionally, this renames the incremental rest tests to bulk tests and ensures that all tests work with both bulk api versions. We set these tests to randomize which version of the api we test each run.	2024-10-17 06:30:33 +11:00
Luca Cavanna	4be9122150	Enhance empty queue conditional in slicing logic (#114911 ) (#114940 ) With recent changes in Lucene 9.12 around not forking execution when not necessary (see https://github.com/apache/lucene/pull/13472), we have removed the search worker thread pool in #111099. The worker thread pool had unlimited queue, and we feared that we couuld have much more queueing on the search thread pool if we execute segment level searches on the same thread pool as the shard level searches, because every shard search would take up to a thread per slice when executing the query phase. We have then introduced an additional conditional to stop parallelizing when there is a queue. That is perhaps a bit extreme, as it's a decision made when creating the searcher, while a queue may no longer be there once the search is executing. This has caused some benchmarks regressions, given that having a queue may be a transient scenario, especially with short-lived segment searches being queued up. We may end up disabling inter-segment concurrency more aggressively than we would want, penalizing requests that do benefit from concurrency. At the same time, we do want to have some kind of protection against rejections of shard searches that would be caused by excessive slicing. When the queue is above a certain size, we can turn off the slicing and effectively disable inter-segment concurrency. With this commit we set that threshold to be the number of threads in the search pool.	2024-10-17 06:01:35 +11:00
Brian Seeders	b84e990232	Bump to version 8.17.0	2024-10-16 12:32:03 -04:00
Salvatore Campagna	0cab608638	Inject the `host.name` field mapping only if required for `logsdb` index mode (#114573 ) (#114916 ) Here we check for the existence of a `host.name` field in index sort settings when the index mode is `logsdb` and decide to inject the field in the mapping depending on whether it exists or not. By default `host.name` is required for sorting in LogsDB. This reduces the chances for errors at mapping or template composition time as a result of injecting the `host.name` field only if strictly required. A user who wants to override index sort settings without including a `host.name` field would be able to do so without finding an additional `host.name` field in the mappings (injected automatically). If users override the sort settings and a `host.name` field is not included we don't need to inject such field since sorting does not require it anymore. As a result of this change we have the following: * the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`) * the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings Note also that all sort settings `index.sort.*` are `final` which means doing this check once, when mappings are merged at template composition time, is enough. (cherry picked from commit `9bf6e3b0ba`) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-16 18:03:35 +02:00
Kostas Krikellas	bae09a8cf2	[8.x] Reset array scope tracking for nested objects (#114891 ) (#114906 ) * Reset array scope tracking for nested objects (#114891) * Reset array scope tracking for nested objects * update * update * update (cherry picked from commit `8ae5ca468d`) # Conflicts: # muted-tests.yml * update * update	2024-10-17 01:30:54 +11:00
Panagiotis Bailis	5aadfebfca	Removing tech-preview header and updating documentation for retrievers and RRF (#114810 ) (#114875 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-16 22:38:37 +11:00
David Turner	bdac01785a	Inline `MockTransportService#getLocalDiscoNode()` (#114883 ) (#114887 ) This method just delegates to `getLocalNode()`, we may as well call the more widely-used method with the shorter name directly.	2024-10-16 21:47:07 +11:00
Mary Gouseti	6de48b122f	[Failure store - selector syntax] Replace failureOptions with selector options internally. (#114812 ) (#114882 ) Introduction > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see https://github.com/elastic/elasticsearch/pull/113144. All this work has been cherry picked from there. Purpose of this PR This PR is replacing the `FailureStoreOptions` with the `SelectorOptions`, there shouldn't be any perceivable change to the user since we kept the query parameter "failure_store" for now. It will be removed in the next PR which will introduce the parsing of the expressions. _The current PR is just a refactoring and does not and should not change any existing behaviour._	2024-10-16 20:27:25 +11:00
Salvatore Campagna	87987e7c6a	[8.x] Allow synthetic source and disabled source for standard indices (#114817 ) (#114866 ) * Allow synthetic source and disabled source for standard indices (#114817) When using the index.mapping.source.mode setting we need to make sure that it takes precedence and that is used also when standard index mode is used. Without this patch we always return stored source if _source.mode is not used and the setting is. Relates #114433 (cherry picked from commit `3af4d67fac`) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java * fix: conflict resolution mistake * fix: error message --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-16 08:56:47 +02:00
David Kyle	19fddef8fb	[ML] Create an ml node inference endpoint referencing an existing deployment (#114750 ) (#114858 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-16 09:51:37 +11:00
Benjamin Trent	c43c9b6086	Fix bbq index feature exposure for testing & remove feature flag (#114832 ) (#114851 ) We actually don't need a cluster feature, a capability added if the feature flag is enabled is enough for testing. closes https://github.com/elastic/elasticsearch/issues/114787 (cherry picked from commit `e87b894f68`) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-16 06:51:06 +11:00
Oleksandr Kolomiiets	17022fdefc	[8.x] Allow stored source in logsdb and tsdb (#114454 ) (#114648 ) * Allow stored source in logsdb and tsdb (#114454) (cherry picked from commit `a62228a744`) # Conflicts: # modules/aggregations/build.gradle # modules/data-streams/src/javaRestTest/java/org/elasticsearch/datastreams/logsdb/LogsIndexModeCustomSettingsIT.java # rest-api-spec/build.gradle * Fix tests * Fix tests --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-15 12:07:27 -07:00
David Turner	a47a21012f	Introduce utils for _really_ stashing the thread context (#114786 ) (#114841 ) `ThreadContext#stashContext` does not yield a completely fresh context: it preserves headers related to tracing the original request. That may be appropriate in many situations, but sometimes we really do want to detach processing entirely from the original task. This commit introduces new utilities to do that.	2024-10-15 18:45:52 +01:00
Costin Leau	fc901055ba	ESQL: Introduce per agg filter (#113735 ) (#114842 ) * ESQL: Introduce per agg filter (#113735) Add support for aggregation scoped filters that work dynamically on the data in each group. \| STATS success = COUNT() WHERE 200 <= code AND code < 300, redirect = COUNT() WHERE 300 <= code AND code < 400, client_err = COUNT() WHERE 400 <= code AND code < 500, server_err = COUNT() WHERE 500 <= code AND code < 600, total_count = COUNT() Implementation wise, the base AggregateFunction has been extended to allow a filter to be passed on. This is required to incorporate the filter as part of the aggregate equality/identify which would fail with the filter as an external component. As part of the process, the serialization for the existing aggregations had to be fixed so AggregateFunction implementations so that it delegates to their parent first. (cherry picked from commit `d102659dce`) Update docs/changelog/114842.yaml * Delete docs/changelog/114842.yaml	2024-10-16 03:09:03 +11:00
Jim Ferenczi	8204ebae57	Fix Max Score Propagation in RankDocsQuery (#114716 ) (#114725 ) Fix rank doc query when some segments have no ranked docs	2024-10-16 01:28:04 +11:00
David Kyle	63b4d76c33	[ML] Dynamically get of num allocations (#114636 ) (#114805 )	2024-10-15 12:35:50 +01:00

... 3 4 5 6 7 ...

14959 commits