elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-22 14:17:51 -04:00

Author	SHA1	Message	Date
Yang Wang	50c71bcfdb	[Test] Ranged read should read non-empty content (#106000 ) (#106525 ) Empty read is [short-circuited](`e8039b9ecb/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java (L115-L116)`) without going to the blob store. In order to test s3 blob store, ranged read should read at least one byte. This PR ensures that. Resolves: #105958	2024-03-20 04:56:14 -04:00
Martijn van Groningen	0d1e7a4a0e	Small time series agg improvement (#106288 ) (#106307 ) After tsid hashing was introduced (#98023), the time series aggregator generates the tsid (from all dimension fields) instead of using the value from the _tsid field directly. This generation of the tsid happens for every time serie, parent bucket and segment combination. This changes alters that by only generating the tsid once per time serie and segment. This is done by just locally recording the current tsid.	2024-03-13 13:02:58 -04:00
Matteo Piergiovanni	4cc9f5cf64	Field-caps field has value lookup use map instead of looping array (#105770 ) (#106131 ) (cherry picked from commit `35b2dbee2a`)	2024-03-11 08:40:27 +01:00
Jim Ferenczi	34fe40b5b0	Fix performance bug in `SourceConfirmedTextQuery#matches` (#105930 ) (#105983 ) This change ensures that the matches implementation of the `SourceConfirmedTextQuery` only checks the current document instead of calling advance on the two phase iterator. The latter tries to find the first doc that matches the query instead of restricting the search to the current doc. This can lead to abnormally slow highlighting if the query is very restrictive and the highlight is done on a non-matching document. Closes #103298	2024-03-06 08:48:28 +00:00
David Turner	203f549e14	`URLRepository` should not block shutdown (#105588 ) (#105614 ) Today a node with a registered `URLRepository` will not shut down cleanly because it never releases the last of the `activityRefs`. This commit fixes that.	2024-02-19 06:11:55 -05:00
David Turner	127da57578	Fix use-after-free at event-loop shutdown (#105486 ) (#105575 ) We could still be manipulating a network message when the event loop shuts down, causing us to close the message while it's still in use. This is at best going to be a little surprising to the caller, and at worst could be an outright use-after-free bug. This commit moves the double-check for a leaked promise to happen strictly after the event loop has fully terminated, so that we can be sure we've finished using it by this point. Relates #105306, #97301	2024-02-15 15:24:11 -05:00
David Turner	c0e931af06	Detach persistent task execution from `ThreadPool` (#105460 ) Similar to #99392, #97879 etc, no need to have the `NodePersistentTasksExecutor` look up the executor to use each time, nor does it necessarily need to use a named executor from the `ThreadPool`. This commit pulls the lookup earlier in initialization so we can just use a bare `Executor` instead.	2024-02-14 08:55:05 +00:00
Martijn van Groningen	67bf5f3d28	Improve test coverage for index shrinking a tsdb index. (#105459 )	2024-02-14 08:10:25 +01:00
Keith Massey	f0ec294382	Limiting the number of nested pipelines that can be executed (#105428 ) Limiting the number of nested pipelines that can be executed within a single pipeline to 100	2024-02-13 16:28:31 -06:00
Keith Massey	c884945a93	Adding executedPipelines to the IngestDocument copy constructor (#105427 )	2024-02-13 15:11:47 -06:00
Jack Conradson	b5828fbb67	Add plumbing to check cluster features in SearchSourceBuilder (#105417 ) This change adds additional plumbing to pipe through the available cluster features into SearchSourceBuilder. A number of different APIs use SearchSourceBuilder so they had to make this available through their parsers as well often through ParserContext. This change is largely mechanical passing a Predicate into existing REST actions to check for feature availability. Note that this change was pulled mostly from this PR (#105040).	2024-02-13 08:30:04 -08:00
Keith Massey	e2b2232569	Improving the performance of the ingest simulate verbose API (#105265 ) This updates the simulate verbose API to run in O(N) (for number of pipelines) time and memory like the simulate and ingest APIs rather than O(N^2).	2024-02-12 16:04:21 -06:00
Dmitry Cherniachenko	a50e58d99a	Use single-char variant of String.indexOf() where possible (#105205 ) * Use single-char variant of String.indexOf() where possible indexOf(char) is more efficient than searching for the same one-character String. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-02-12 14:14:32 -05:00
Keith Massey	5ccadaee7b	Adding a custom exception for problems with the graph of pipelines to be applied to a document (#105196 ) This PR removes the need to parse the exception message to detect if a cycle has been detected in the ingest pipelines to be run on a document.	2024-02-12 13:11:00 -06:00
Dmitry Cherniachenko	e21a4874ab	Use String.replace() instead of replaceAll() for non-regexp replacements (#105127 ) * Use String.replace() instead of replaceAll() for non-regexp replacements When arguments do not make use of regexp features replace() is a more efficient option, especially the char-variant.	2024-02-12 13:11:15 -05:00
Przemyslaw Gomulka	11f3c29089	DocumentSizeObserver infrastructure to allow not reporting upon failures (#104859 ) We want to report that observation of document parsing has finished only upon a successful indexing. To achieve this, we need to perform reporting only in one place (not as previously in both IngestService and 'bulk action') This commit splits the DocumentParsingObserver in two. One for wrapping an XContentParser and returning the observed state - the DocumentSizeObserver and a DocumentSizeReporter to perform an action when parsing has been completed and indexing successful. To perform reporting in one place we need to pass the state from IngestService to 'bulk action'. The state is currently represented as long - normalisedBytesParsed. In TransportShardBulkAction we are getting the normalisedBytesParsed information and in the serverless plugin we will check if the value is indicating that parsing already happened in IngestService (value being != -1) we create a DocumentSizeObserver with the fixed normalisedBytesParsed and won't increment it. When the indexing is completed and successful we report the observed state for an index with DocumentSizeReporter small nit: by passing the documentParsingObserve via SourceToParse we no longer have to inject it via complex hierarchy for DocumentParser. Hence some constructor changes	2024-02-12 17:16:24 +01:00
Alexander Spies	a241b96b8f	Upgrade ANTLR4 to 4.13.1 (#105334 )	2024-02-12 15:24:32 +01:00
Kostas Krikellas	510c8515ab	Refactor getAliases, add more yaml tests (#105370 ) * Refactor getAliases, add more yaml tests * more test updates	2024-02-12 09:10:45 +02:00
Martijn van Groningen	47304a1f04	Added a more randomized passthrough indexing test. (#105344 ) That also asserts routing aspects of indexing, searching and getting by id. Relates to #103567	2024-02-09 11:34:15 -05:00
Keith Massey	45885fdb91	Adding some tests for many nested pipeline processors (#105291 )	2024-02-09 08:09:50 -06:00
Mary Gouseti	f1aae380d9	[IndicesOptions] Group indices options based on what they are applied on. (#104655 ) In this PR we are refactoring the internals of the `IndicesOptions` class. Because this class is widely used the refactoring is strictly an internal refactoring, we do not change the existing serialisation. This allows us to better test this and to preserve performance over the wire. The improvements we are brining forth with this PR are: - New internal structure of the flags, based on what the flags influnce. - Every flag is a boolean instead of using the presence of an enum options in a set. - We provide builders to allow easier construction of the object and easier overriding of the defaults. - This will enable easier extension that might be useful for other projects.	2024-02-09 13:01:34 +02:00
Armin Braun	5c8006499a	Move test-only search response x-content-parsing code to test codebase (#105308 ) Loads of code here that is only used in tests and one duplicate unused class that was only used as an indirection to parsing the `AsyncSearchResponse`. Moved what I could easily move via automated refactoring to `SearchResponseUtils` in tests and removed the duplicate now unused class from the client codebase.	2024-02-09 11:56:39 +01:00
David Turner	2615aa00b8	Fix race in HTTP response shutdown handling (#105306 ) Similar to #97301, the fix in #105293 was still not quite correct: we could in principle shut down the transport after checking `isOpen()` but before sending the message. Applying the same fix as for the transport layer here.	2024-02-09 08:43:13 +00:00
Yang Wang	5cf80496e5	Add s3 HeadObject request to request stats (#105105 ) The HeadObject request should be included in requests stats and metrics for completeness. This PR does that. Relates: #98083 Resolves: ES-7810	2024-02-08 21:29:14 -05:00
Yang Wang	a3b3083d2c	Do not record s3 http request time when it is not available (#105103 ) The metrics of HTTP request time can be unavailable (null) when the request fails on the client side. This PR makes sure we do not attempt to record it when it happens to avoid NPE.	2024-02-09 12:32:55 +11:00
David Turner	97dbb2a27e	Fix leaked HTTP response sent after close (#105293 ) Today a `HttpResponse` is always released via a `ChannelPromise` which means the release happens on a network thread. However, it's possible we try and send a `HttpResponse` after the node has got far enough through shutdown that it doesn't have any running network threads left, which means the response just leaks. This is no big deal in production, it becomes irrelevant when the process exits, but in tests we start and stop many nodes within the same process so mustn't leak anything. At this point in shutdown, all HTTP channels are now closed, so it's sufficient to check whether the channel is open first, and to fail the listener on the calling thread if not. That's what this commit does. Closes #104651	2024-02-08 14:57:02 -05:00
Matteo Piergiovanni	54cfce4379	Flag in _field_caps to return only fields with values in index (#103651 ) We are adding a query parameter to the field_caps api in order to filter out fields with no values. The parameter is called `include_empty_fields` and defaults to true, and if set to false it will filter out from the field_caps response all the fields that has no value in the index. We keep track of FieldInfos during refresh in order to know which field has value in an index. We added also a system property `es.field_caps_empty_fields_filter` in order to disable this feature if needed. --------- Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>	2024-02-08 17:52:21 +01:00
Michael Peterson	ac36aa7795	Resolve Cluster API (#102726 ) To improve cross-cluster search user experience, Kibana needs an endpoint that is accessible by arbitrary Kibana dashboard search users and provides: 1. a listing of clusters in scope for a CCS query (based on the index expression and whether there are any indices on each cluster that the Kibana user has access to query). 2. whether that cluster is currently connected to the querying cluster (will it come back as skipped or failed in a CCS search) 3. showing the skip_unavailable setting for those clusters (so you can know whether it will return skipped or failed in a CCS search) 4. the ES version of the cluster Since no single Elasticsearch endpoint provides all of these features, this PR creates a new endpoint `_resolve/cluster` that works along side the existing `_resolve/index` endpoint (and leverages some of its features). Example usage against a cluster with 2 remote clusters configured: GET /_resolve/cluster/,remote:bl* Response: { "(local)": { "connected": true, "skip_unavailable": false, "matching_indices": true, "version": { "number": "8.12.0-SNAPSHOT", "build_flavor": "default", "minimum_wire_compatibility_version": "7.17.0", "minimum_index_compatibility_version": "7.0.0" } }, "remote2": { "connected": true, "skip_unavailable": true, "matching_indices": true, "version": { "number": "8.12.0-SNAPSHOT", "build_flavor": "default", "minimum_wire_compatibility_version": "7.17.0", "minimum_index_compatibility_version": "7.0.0" } }, "remote1": { "connected": true, "skip_unavailable": false, "matching_indices": false, "version": { "number": "8.12.0-SNAPSHOT", "build_flavor": "default", "minimum_wire_compatibility_version": "7.17.0", "minimum_index_compatibility_version": "7.0.0" } } } Almost all errors show up as "error" entries in the response. Only the local SecurityException returns a 403 since that happens before the ResolveCluster Transport code kicks in.	2024-02-08 10:50:05 -05:00
Felix Barnsteiner	f426b68a82	Unmute LogsDataStreamIT.testIgnoreDynamicBeyondLimit (#105282 )	2024-02-08 13:26:42 +01:00
David Turner	e489951d84	Close `currentChunkedWrite` on client cancel (#105258 ) If the client closes the channel while we're in the middle of a chunked write then today we don't complete the corresponding listener. This commit fixes the problem.	2024-02-08 07:07:04 -05:00
Felix Barnsteiner	9dfd5dbd8f	Mute LogsDataStreamIT.testIgnoreDynamicBeyondLimit (#105280 )	2024-02-08 12:25:45 +01:00
Ignacio Vera	8f37ef977f	Remove abstract method InternalMultiBucketAggregation#reduceBucket (#105275 )	2024-02-08 11:24:02 +01:00
Felix Barnsteiner	50902e15a6	Use new `ignore_dynamic_beyond_limit` setting in logs and metrics data streams (#105180 ) This reduces the risk of document loss if too many fields are added. As these component templates are imported by Fleet, this also affects integrations.	2024-02-08 04:23:50 -05:00
Ignacio Vera	609e8059eb	Introduce an AggregatorReducer to reduce the footprint of aggregations in the coordinating node (#105207 ) This commit adds an abstraction that performs reduction of InternalAggregations in a streaming fashion.	2024-02-08 09:30:54 +01:00
Mary Gouseti	65d1d3d47d	Change the rest client configuration in the LazyRolloverDataStreamIT (#105243 )	2024-02-07 17:44:40 +02:00
Niels Bauman	64891011d3	Extend `repository_integrity` health indicator for unknown and invalid repos (#104614 ) This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status. To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks. Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.	2024-02-07 15:18:55 +01:00
Mary Gouseti	011876367a	Execute lazy rollover with an internal dedicated user #104732 (#104905 ) The unconditional rollover that is a consequence of a lazy rollover command is triggered by the creation of a document. In many cases, the user triggering this rollover won't have sufficient privileges to ensure the successful execution of this rollover. For this reason, we introduce a dedicated rollover action and a dedicated internal user to cover this case and enable this functionality.	2024-02-07 13:01:01 +02:00
Ignacio Vera	4d5416912b	Use an AbstractList to build the AggregationList for reduction (#105200 ) We are building a list of InternalAggregations from a list of Buckets, therefore we can use an AbstractList to create the actual list and save some allocations.	2024-02-06 17:53:41 +01:00
Joe Gallo	341f845832	Ingest geoip: tidy up logging code (#105086 )	2024-02-06 10:44:48 -05:00
Joe Gallo	d392cd7d56	Tidy up collections code (#105085 )	2024-02-06 10:44:20 -05:00
Felix Barnsteiner	ff0f83f59d	Make field limit more predictable (#102885 ) Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as `search_as_you_type` or `percolator` count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit. ~In addition to that, it also simplifies https://github.com/elastic/elasticsearch/pull/96235 as it makes the implementation of `Mapper.Builder#getTotalFieldsCount` much easier and easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk of over- or under-estimating the field count of a `Mapper.Builder` in `DocumentParserContext#addDynamicMapper`, which in turn reduces the risk of data loss due to the issue described here: https://github.com/elastic/elasticsearch/pull/96235#discussion_r1402495749.~ Edit: due to https://github.com/elastic/elasticsearch/pull/103865, we don't need an implementation of `getTotalFieldsCount` or `mapperSize` in `Mapper.Builder`. Still, this PR more closely aligns `Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`, which `DocumentParserContext#addDynamicMapper` uses to determine whether the field limit is hit A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss. We could also think about whether to apply the new counting logic only to new indices (depending on the `IndexVersion`). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the `IndexVersion` to `MappingLookup` which previously didn't require the `IndexVersion`. This PR is meant as a conversation starter. It would also simplify https://github.com/elastic/elasticsearch/pull/96235 but I don't think this blocks that PR in any way. I'm curious about the opinion of @javanna and @jpountz on this.	2024-02-06 06:58:42 -05:00
James Baiera	9d3a645d59	Redirect failed ingest node operations to a failure store when available (#103481 ) This PR updates the ingest service to detect if a failed ingest document was bound for a data stream configured with a failure store, and in that event, restores the document to its original state, transforms it with its failure information, and redirects it to the failure store for the data stream it was originally targeting.	2024-02-05 14:37:30 -05:00
Armin Braun	f879508834	Avoid building large CompositeByteBuf when sending transport messages (#105137 ) We can avoid building composite byte buf instances on the transport layer (they have quite a bit of overhead and make heap dumps more complicated to read). There's no need to add another round of references to the BytesReference components here. Just write these out as they come in. This would allow for some efficiency improving follow-ups where we can essentially release the pages that have passed the write pipeline. To avoid having this explode the size of the queue for writes per channel, I moved that to a linked list. The slowdown from a linked list is irrelevant I believe. Mostly the queue is empty so it doesn't matter or if it isn't empty, operations other than dequeuing are much more important to performance in this logic anyway (+ Netty internally uses a LL down the line anyway). I would regard this as step-1 in making the serialisation here more lazy like on the REST layer to avoid copying bytes to the outbound buffer that we already have as `byte[]`.	2024-02-05 14:35:15 -05:00
Martijn van Groningen	39eefb3197	Unmute TimeSeriesTsidHashCardinalityIT (#105121 ) and reduce the number of time series in order to fix test related OOME. Relates to #105104	2024-02-05 17:20:30 +01:00
Kostas Krikellas	e85bb5afc3	Nest pass-through objects within objects (#105062 ) * Fix test failure https://gradle-enterprise.elastic.co/s/icg66i6mwnjoi * Fix test failure https://gradle-enterprise.elastic.co/s/icg66i6mwnjoi * Nest pass-through objects within objects * Update docs/changelog/105062.yaml * improve test	2024-02-05 09:31:13 +02:00
Yang Wang	552d2f563b	Expose OperationPurpose via CustomQueryParameter to s3 logs (#105044 ) This PR adds the OperationPurpose as a custom query parameter for each S3 request so that they are available in s3 access logs. Resolves: ES-7750	2024-02-04 03:21:50 -05:00
Nhat Nguyen	40a61abb95	Awaits fix #105104	2024-02-03 18:34:03 -08:00
Moritz Mack	54088839b4	Do not enable APM agent 'instrument', it's not required for manual tracing. (#105055 )	2024-02-02 18:13:00 +01:00
Mary Gouseti	55cf726a80	Fix typo in test (#104744 ) (#105052 ) Easy fix, there was a typo in the warning instead of checking for the correct index patterns `other-` it was checking for `ds-`. Closes #104774	2024-02-02 06:30:54 -05:00
John Verwolf	98a37c7b6b	Enhancement: Metrics for Search Took Times using Action Listeners (#104996 ) * Instrument search took times * Update assertion helper method to use client param * Update docs/changelog/104996.yaml * spotless * Fix test	2024-02-01 12:51:12 -08:00

1 2 3 4 5 ...

7747 commits