elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-21 22:00:36 -04:00

Author	SHA1	Message	Date
Ignacio Vera	c62df16c38	Fix Bucket ordering for partial reduction in date histogram and histogram aggregation (#108184 ) #105359. we changed the bucket ordering for partial reduces which causes issues when the output is shared with a node running on an older version. This commit reorders the output to the expected order for previous versions.	2024-05-03 14:31:54 +02:00
Felix Barnsteiner	3667887d46	[8.13] Fix merging component templates with a mix of dotted and nested object mapper definitions (#106077 ) (#107254 ) * Fix merging component templates with a mix of dotted and nested object mapper definitions (#106077) Co-authored-by: Andrei Dan <andrei.dan@elastic.co> (cherry picked from commit `ab52ef1f06`) * Remove unnecessary subobjects flag in buildMergedMappers this sneaked in while resolving merge conflicts	2024-04-09 08:30:31 -04:00
Niels Bauman	f9506522f1	Avoid unintentionally clearing the `DataStream.rolloverOnWrite` flag (#107122 ) (#107206 ) A lot of places in the code use a `DataStream` constructor that sets the `rolloverOnWrite` flag to `false`. For some places, this was intentional, but for others, this was erroneous (and for most tests, it didn't matter much). This PR fixes the erroneous spots and avoids similar unintentional behavior in the future by removing the constructor in question altogether. Most use cases just want to copy the flag over and if you _do_ want to set the flag to false, it makes more sense to do so explicitly yourself rather than letting the constructor do it for you. An additional small bonus is that we have one less constructor for the `DataStream` class :). Follow up of [this](https://github.com/elastic/elasticsearch/pull/107035#discussion_r1549299287) discussion.	2024-04-08 15:09:34 +02:00
Luca Cavanna	57b0de7318	Fix concurrency bug in AbstractStringScriptFieldAutomatonQuery (#106678 ) Back when we introduced queries against runtime fields, Elasticsearch did not support inter-segment concurrency yet. At the time, it was fine to assume that segments will be searched sequentially. AbstractStringScriptFieldAutomatonQuery used to have a BytesRefBuilder instance shared across the segments, which gets re-initialized when each segment starts its work. This is no longer possible with inter-segment concurrency. Closes #105911	2024-03-26 17:39:31 +01:00
Mark Vieira	d1c98f39fd	Validate that test cluster BWC nodes use the default distribution (#106559 ) (#106565 ) We have instances where BWC tests configure old ES version nodes with the integTest distribution. This isn't a valid configuration, and while we in reality resolve the default distribution artifact, we have other configuration logic that behaves differently based on whether the integTest distro was _requested_. Specifically, what to set ES_JAVA_HOME to. This bug resulted in us attempting to run old nodes using the current bundled JDK version, which may be incompatible with that older version of Elasticsearch. Closes #104858	2024-03-20 12:55:47 -04:00
Benjamin Trent	bc57a519b7	Manually backport changes from #105578 (#105913 )	2024-03-05 06:33:50 -05:00
Mark Vieira	e516f8c06d	Fix issue when installing multiple test cluster plugins (#105516 ) (#105519 )	2024-02-14 16:11:36 -05:00
Mary Gouseti	77a5d15cc7	Wait for all cluster state updates to be processed before wiping the cluster (#104846 ) (#105349 ) In #104846, we see an inconsistency in the state of the clusters between the test and the clean up at the end of the test. The failure is as follows: - The test using ILM at some point creates a full searchable snapshots with name `restore-my-index-xxxx` - Then ILM moves to frozen and creates a partially searchable snapshots with alias `restore-my-index-xxxx` - The test confirms that `restore-my-index-xxxx` is an alias and ends - During tear down, it appears that the cluster state retrieved contains the `restore-my-index-xxxx` as an index so it issues a request to delete it. - The deletion fails because `restore-my-index-xxxx` is an alias. I do not think that the test has an issue, most of the clues shows that the partial searchable snapshot has been correctly processed. Only this cluster state retrieval seems a bit off. In order to reduce this flakiness we introduce a `GET _cluster/health?wait_for_events=languid` to ensure we get the latest cluster state. Fixes #104846	2024-02-13 09:12:17 -05:00
David Turner	e2ff447b60	Improve `RareClusterStateIT` (#105390 ) Today this test suite relies on being able to cancel an in-flight publication after it's reached a committed state. This is questionable, and also a little flaky in the presence of the desired balance allocator which may introduce a short delay before enqueuing the cluster state update that performs the reconciliation step. This commit removes the questionable meddling with the internals of `Coordinator` and instead just blocks the cluster state updates at the transport layer to achieve the same effect. Closes #102947	2024-02-12 15:43:29 -05:00
Dmitry Cherniachenko	a50e58d99a	Use single-char variant of String.indexOf() where possible (#105205 ) * Use single-char variant of String.indexOf() where possible indexOf(char) is more efficient than searching for the same one-character String. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-02-12 14:14:32 -05:00
Dmitry Cherniachenko	e21a4874ab	Use String.replace() instead of replaceAll() for non-regexp replacements (#105127 ) * Use String.replace() instead of replaceAll() for non-regexp replacements When arguments do not make use of regexp features replace() is a more efficient option, especially the char-variant.	2024-02-12 13:11:15 -05:00
Przemyslaw Gomulka	11f3c29089	DocumentSizeObserver infrastructure to allow not reporting upon failures (#104859 ) We want to report that observation of document parsing has finished only upon a successful indexing. To achieve this, we need to perform reporting only in one place (not as previously in both IngestService and 'bulk action') This commit splits the DocumentParsingObserver in two. One for wrapping an XContentParser and returning the observed state - the DocumentSizeObserver and a DocumentSizeReporter to perform an action when parsing has been completed and indexing successful. To perform reporting in one place we need to pass the state from IngestService to 'bulk action'. The state is currently represented as long - normalisedBytesParsed. In TransportShardBulkAction we are getting the normalisedBytesParsed information and in the serverless plugin we will check if the value is indicating that parsing already happened in IngestService (value being != -1) we create a DocumentSizeObserver with the fixed normalisedBytesParsed and won't increment it. When the indexing is completed and successful we report the observed state for an index with DocumentSizeReporter small nit: by passing the documentParsingObserve via SourceToParse we no longer have to inject it via complex hierarchy for DocumentParser. Hence some constructor changes	2024-02-12 17:16:24 +01:00
Nhat Nguyen	d9ffc59213	OutboundHandler should handle errors when sending responses (#105255 ) The OutboundHandler should handle errors that occur during sending responses. Specifically, if sending a response encounters an exception, it should attempt to send an error response. If sending the error response fails, the OutboundHandler should close the channel and log a warning. This change would reduce the burden on callers, as they would no longer need to handle errors for these cases.	2024-02-09 09:16:21 -08:00
Armin Braun	5c8006499a	Move test-only search response x-content-parsing code to test codebase (#105308 ) Loads of code here that is only used in tests and one duplicate unused class that was only used as an indirection to parsing the `AsyncSearchResponse`. Moved what I could easily move via automated refactoring to `SearchResponseUtils` in tests and removed the duplicate now unused class from the client codebase.	2024-02-09 11:56:39 +01:00
Armin Braun	cf27a501aa	Introduce StreamInput.readSlicedBytesReference (#105262 ) This is mainly added as a prerequisite to slicing doc sources out of bulk index requests. The few usages for it added in this PR have limited performance impact but demonstrate correct functioning of the implementations. Co-authored-by: David Turner <david.turner@elastic.co>	2024-02-09 02:30:30 +01:00
Mark Vieira	6c4e55e714	Fix file path when looking for docker exclusions file (#105304 )	2024-02-08 12:27:09 -08:00
Matteo Piergiovanni	54cfce4379	Flag in _field_caps to return only fields with values in index (#103651 ) We are adding a query parameter to the field_caps api in order to filter out fields with no values. The parameter is called `include_empty_fields` and defaults to true, and if set to false it will filter out from the field_caps response all the fields that has no value in the index. We keep track of FieldInfos during refresh in order to know which field has value in an index. We added also a system property `es.field_caps_empty_fields_filter` in order to disable this feature if needed. --------- Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>	2024-02-08 17:52:21 +01:00
Dmitry Cherniachenko	263ea5e987	Replace generic HashSet / HashMap with more efficient EnumSet / EnumMap (#105238 )	2024-02-08 13:43:14 +00:00
Albert Zaharovits	6cd29a6331	Introduce AggregationBuilder#deepCopy (#105114 ) Introduces an AggregationBuilder#deepCopy method that iteratively copies, and optionally modifies, AggregationBuilder instances. Relates: #104895	2024-02-08 13:23:38 +02:00
David Turner	4467352887	More descriptive messages for `safeAwait` (#105260 ) Today the various `ESTestCase#safeAwait` variants do not include a descriptive message with their failures, which means you have to dig through the stack trace to work out the reason for a test failure. This commit adds the missing messages to make it a little easier on the reader.	2024-02-08 08:45:41 +00:00
Ignacio Vera	609e8059eb	Introduce an AggregatorReducer to reduce the footprint of aggregations in the coordinating node (#105207 ) This commit adds an abstraction that performs reduction of InternalAggregations in a streaming fashion.	2024-02-08 09:30:54 +01:00
Felix Barnsteiner	ff0f83f59d	Make field limit more predictable (#102885 ) Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as `search_as_you_type` or `percolator` count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit. ~In addition to that, it also simplifies https://github.com/elastic/elasticsearch/pull/96235 as it makes the implementation of `Mapper.Builder#getTotalFieldsCount` much easier and easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk of over- or under-estimating the field count of a `Mapper.Builder` in `DocumentParserContext#addDynamicMapper`, which in turn reduces the risk of data loss due to the issue described here: https://github.com/elastic/elasticsearch/pull/96235#discussion_r1402495749.~ Edit: due to https://github.com/elastic/elasticsearch/pull/103865, we don't need an implementation of `getTotalFieldsCount` or `mapperSize` in `Mapper.Builder`. Still, this PR more closely aligns `Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`, which `DocumentParserContext#addDynamicMapper` uses to determine whether the field limit is hit A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss. We could also think about whether to apply the new counting logic only to new indices (depending on the `IndexVersion`). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the `IndexVersion` to `MappingLookup` which previously didn't require the `IndexVersion`. This PR is meant as a conversation starter. It would also simplify https://github.com/elastic/elasticsearch/pull/96235 but I don't think this blocks that PR in any way. I'm curious about the opinion of @javanna and @jpountz on this.	2024-02-06 06:58:42 -05:00
James Baiera	9d3a645d59	Redirect failed ingest node operations to a failure store when available (#103481 ) This PR updates the ingest service to detect if a failed ingest document was bound for a data stream configured with a failure store, and in that event, restores the document to its original state, transforms it with its failure information, and redirects it to the failure store for the data stream it was originally targeting.	2024-02-05 14:37:30 -05:00
Pat Whelan	2932500ce2	[Transform] return results in order (#105089 ) * Transform: return results in order Currently, when Transform searches over aggregations, it stores the results in an unordered HashMap. This potentially rearranges search results. For example, if a user specifies an order in a search request, the search response is in that order. But if the search request is embedded in a Transform request, then Transform response will not preserve the order and the result will look different. With this change, Transform will always preserve the order of the search response. A search embedded in a Transform should behave as an unembedded search. Closes #104847	2024-02-05 11:57:20 -05:00
Nhat Nguyen	86c1fa2a6c	Avoid convert to string when parse resp in heap attack (#105109 ) We've seen cases of OOM errors in the test runner process, which occur when we convert a response to a JSON string and then parse it. We can directly parse from its input stream to avoid these OOM errors.	2024-02-05 07:16:25 -08:00
Yang Wang	552d2f563b	Expose OperationPurpose via CustomQueryParameter to s3 logs (#105044 ) This PR adds the OperationPurpose as a custom query parameter for each S3 request so that they are available in s3 access logs. Resolves: ES-7750	2024-02-04 03:21:50 -05:00
Keith Massey	64a790f8fe	Modifying ingest request builders (#104636 ) This changes all of our ingest-related builders other than BulkRequestBuilder (which was already changed in #104927) to inherit from ActionRequestLazyBuilder (added in #104927) rather than ActionRequestBuilder. This means that the requests will not be created until the builder's request() method is called, making upcoming ref counting work much more feasible.	2024-02-02 15:53:44 -06:00
Felix Barnsteiner	f642b8a3aa	Add setting to ignore dynamic fields when field limit is reached (#96235 ) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to https://github.com/elastic/elasticsearch/issues/89911 To make this easier to review, this is split into the following PRs: - [x] https://github.com/elastic/elasticsearch/pull/102915 - [x] https://github.com/elastic/elasticsearch/pull/102936 - [x] https://github.com/elastic/elasticsearch/pull/104769 Related but not a prerequisite: - [ ] https://github.com/elastic/elasticsearch/pull/102885	2024-02-02 05:53:52 -05:00
John Verwolf	98a37c7b6b	Enhancement: Metrics for Search Took Times using Action Listeners (#104996 ) * Instrument search took times * Update assertion helper method to use client param * Update docs/changelog/104996.yaml * spotless * Fix test	2024-02-01 12:51:12 -08:00
Ryan Ernst	b67f5a6b57	Make cluster feature predicate available to plugins (#105022 ) A predicate to check whether the cluster supports a feature is available to rest handlers defined in server. This commit adds that predicate to plugins defining rest handlers as well.	2024-02-01 09:11:18 -08:00
Panagiotis Bailis	7ce8d76559	Making k and num_candidates optional for knn search (#101209 )	2024-02-01 15:43:09 +02:00
Kostas Krikellas	149ec37777	Introduce experimental pass-through field type (#103648 ) * Introduce passthrough field type `PassthoughObjectMapper` extends `ObjectMapper` to create a container for fields that also need to be referenced as if they were at the root level. This is done by creating aliases for all its subfields. It also supports an option of annotating all its subfields as dimensions. This will be leveraged in TSDB, where dimension fields can be dynamically defined as nested under a passthrough object - and still referenced directly (i.e. without prefixes) in aggregation queries. Related to #103567 * Update docs/changelog/103648.yaml * no subobjects * create dimensions dynamically * remove unused method * restore ignoreAbove incompatibility with dimension * fix test * refactor, skip aliases on conflict * fix branch * fix branch * add tests * update test * remove unused variable * add yaml test for subobject * minor refactoring * add unittest for PassThroughObjectMapper * suggested fixes * suggested fixes * update yaml with warning for duplicate alias * updates from review * add withoutMappers()	2024-02-01 15:31:48 +02:00
Salvatore Campagna	bdd3a4ffbe	Hash the tsid to overcome dimensions limits (#98023 ) A Lucene limitation on doc values for UTF-8 fields does not allow us to write keyword fields whose size is larger then 32K. This limits our ability to map more than a certain number of dimension fields for time series indices. Before introducing this change the tsid is created as a catenation of dimension field names and values into a keyword field. To overcome this limitation we hash the tsid. This PR is intended to be used as a draft to test different options. Note that, as a side effect, this reduces the size of the tsid field as a result of storing far less data when the tsid is hashed. Anyway, we expect tsid hashing to affect compression of doc values and resulting in larger storage footprint. Effect on query latency needs to be evaluated too. Resolves #93564	2024-02-01 08:25:31 -05:00
Shan Swanlow	f048225d62	Share field mapper tests with other modules (#99142 )	2024-02-01 12:14:06 +02:00
Yang Wang	1fd2756f8c	MockLogAppender takes string logger names for capturing (#104971 ) For classes that are not publically accessible, we can use the cannoical names for capturing.	2024-01-31 08:45:15 -05:00
Moritz Mack	dbf59c5414	Update/Cleanup references to old tracing.apm.* legacy settings in favor of the telemetry.* settings (#104917 )	2024-01-31 09:20:05 +01:00
Nhat Nguyen	aea4684b52	Limit concurrent shards per node for ESQL (#104832 ) Today, we allow ESQL to execute against an unlimited number of shards concurrently on each node. This can lead to cases where we open and hold too many shards, equivalent to opening too many file descriptors or using too much memory for FieldInfos in ValuesSourceReaderOperator. This change limits the number of concurrent shards to 10 per node. This number was chosen based on the _search API, which limits it to 5. Besides the primary reason stated above, this change has other implications: We might execute fewer shards for queries with LIMIT only, leading to scenarios where we execute only some high-priority shards then stop. For now, we don't have a partial reduce at the node level, but if we introduce one in the future, it might not be as efficient as executing all shards at the same time. There are pauses between batches because batches are executed sequentially one by one. However, I believe the performance of queries executing against many shards (after can_match) is less important than resiliency. Closes #103666	2024-01-30 09:52:04 -08:00
Keith Massey	ab8e1981ba	Adding a RequestBuilder interface (#104778 ) Adding a RequestBuilder interface in preparation for having a new implementation.	2024-01-29 13:55:59 -06:00
Tanguy Leroux	f3cbcce6a1	Allow to disable CFS usage in tests (#104619 ) Allow to disable CFS usage in tests using a test-only index setting. This can be useful for tests that do not support CFS format.	2024-01-29 11:25:41 +01:00
Ievgen Degtiarenko	bce405f984	Simplify test shard routing (#104472 )	2024-01-29 11:15:12 +01:00
Lorenzo Dematté	242ef73112	Removed cluster_features, switched back to runner features under skip (#104468 ) Temporarly disable new keyword until the Clients team can update their YAML runners	2024-01-29 10:46:10 +01:00
David Turner	368322e828	Introduce `ChunkedLoggingStreamTestUtils` (#104835 ) Today the test utilities for extracting data logged by a `ChunkedLoggingStream` are in the `:server` test suite which renders them inaccessible to the test suites of other modules. This commit moves them to `:test:framework`.	2024-01-29 04:40:22 -05:00
Ignacio Vera	72fe8b30d5	Remove ParsedAggregation from tests (#104790 ) This commit removes any reference to ParsedAggregation from the test framework	2024-01-29 07:46:06 +01:00
David Turner	bec2115f8f	Remove `ThreadPool` arg from `TransportResponseHandler#executor` (#104554 ) This was needed temporarily to support some earlier refactoring work, but is now unnecessary.	2024-01-25 04:42:01 -05:00
David Turner	1116889819	Remove unused arg from `ActionType` ctor (#104650 ) `ActionType` represents an action which runs on the local node, there's no need for implementations to define a `Reader<Response>`. This commit removes the unused constructor argument.	2024-01-25 03:28:32 -05:00
Rene Groeschke	ab8ee60bba	Fix SamlAuthenticationIT flakyness (#103867 )	2024-01-24 13:44:31 -05:00
Bogdan Pintea	0aeb9beb7e	Lower the number of evaluation in testTooManyEval() (#104697 ) This reverts the increase in #104521, which causes the parser to sometimes stack overflow. Fixes #104694.	2024-01-24 08:50:02 -05:00
Iraklis Psaroudakis	9aefad826e	Mute testTooManyEval (#104695 ) Relates #104694	2024-01-24 06:46:38 -05:00
David Turner	cf67f5d2ec	Misc cleanups around `Netty4HttpPipeliningHandler` (#104642 ) - Make `Netty4HttpResponse` sealed, we need to know all its impls - Rename `Netty4FullHttpResponse` to contrast with chunked response - Rename `doWrite` overloads - Reorder `finishChunkedWrite()`, we're ready to handle the next response before completing the promise - Add missing test for splitting large responses - Add a few extra assertions - Clean up IDE warnings	2024-01-24 08:12:59 +00:00
Craig Taverner	eb1c490264	ESQL: Support reading points from doc-values for STATS (#104218 ) * Add initial structure for ST_CENTROID * Revert "Revert stab at implementing forStats for doc-values vs source" This reverts commit `cfc4341bf4`. * Refined csv-spect tests with st_centroid * Spotless disagrees with intellij * Fixes after reverting fieldmapper code to test GeoPointFieldMapper * Get GeoPointFieldMapperTests working again after enabling doc-values reading * Simplify after rebase on main In particular, field-mappers that do not need to know about fields can have simpler calls. * Support local physical planning of forStats attributes for spatial aggregations * Get st_centroid aggregation working on doc-values We changed it to produce BytesRef, so we don't (yet) need any doc-values types. * Create both DocValues and SourceValues versions of st_centroid * Support physical planning of DocValues and SourceValues SpatialCentroid * Improve test for physical planning of DocValues in SpatialCentroid * Fixed show functions for st_centroid * More st_centroid tests with mv_expand To test single and multi-value centroids * Fix st_centroid from point literals The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values. * Teach CsvTests to handle spatial types alternative loading from doc-values Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock. * Fixed failing NodeSubclassTests Required making the new constructor public and enabling Set as a valid parameter in the test framework. * More complex st_centroid tests and fixed bug with multiple aggs When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node. * Update docs/changelog/104218.yaml * Fix automatically generated changelog file * Fixed failing test The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children. * More tests covering more combinations including MV_EXPAND and grouping * Added cartesian st_centroid with grouping test We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code. * Reduce flaky tests by sorting results * Reduce flaky tests by sorting results * Added tests for stats on stats to ensure planner coped * Add unit tests to ensure doc-values in query planning complex cases * Some minor updates from code review * Fixes after rebase on main * Get correct error message on unsupported geo_shape for st_centroid * Refined point vs shape differences after merging main * Added basic docs * Delete docs/changelog/104218.yaml * Revert "Delete docs/changelog/104218.yaml" This reverts commit `4bc596a442`. * Fixed broken docs tag link * Simplify BlockReaderSupport in MapperTestCase from code review * Moved spatial aggregations into a sub-package * Added some more code review updates, including nested tests * Get nested functions working, if only from source values for now * Code review update * Code review update * Added second location column to airports for wider testing * Use second location in tests, including nulls Includes a test fix for loading and converting nulls to encoded longs. * Fixed bug supporting multi spatial aggregations in the local node The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data. * Renamed forStats to fieldExtractPreference for clarity Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases. * EsqlSpecIT was failing on very high precision centroids on different computers This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests. * Delete docs/changelog/104218.yaml * Revert "Delete docs/changelog/104218.yaml" This reverts commit `12c6980881`. * Fixed changelog entry	2024-01-23 16:04:45 +01:00

1 2 3 4 5 ...

4713 commits