elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Tim Brooks	c5caf84e2d	Move raw path into HttpPreRequest (#113231 ) Currently, the raw path is only available from the RestRequest. This makes the logic to determine if a handler supports streaming more challenging to evaluate. This commit moves the raw path into pre request to allow easier streaming support logic.	2024-09-21 05:32:45 +10:00
Oleksandr Kolomiiets	b9855b8e4e	Correctly identify parent of copy_to destination field for synthetic source purposes (#113153 )	2024-09-20 12:08:55 -07:00
David Turner	01cb679858	Make `CreateDataStreamClusterStateUpdateRequest` a record (#113282 ) No need to extend `ClusterStateUpdateRequest` here.	2024-09-21 02:45:34 +10:00
David Turner	33a73a8111	Trigger merges after recovery (#113102 ) We may have shut a shard down while merges were still pending (or adjusted the merge policy while the shard was down) meaning that after recovery its segments do not reflect the desired state according to the merge policy. With this commit we invoke `IndexWriter#maybeMerge()` at the end of recovery to check for, and execute, any such lost merges.	2024-09-20 17:16:03 +01:00
Ryan Ernst	2ecfb397ad	Remove plugin classloader indirection (#113154 ) Extensible plugins use a custom classloader for other plugin jars. When extensible plugins were first added, the transport client still existed, and elasticsearch plugins did not exist in the transport client (at least not the ones that create classloaders). Yet the transport client still created a PluginsService. An indirection was used to avoid creating separate classloaders when the transport client had created the PluginsService. The transport client was removed in 8.0, but the indirection still exists. This commit removes that indirection layer.	2024-09-20 07:45:40 -07:00
Ryan Ernst	77bf9740f7	Add test for dense transport versions (#113213 ) Now that 8.x is branched from main, all transport version changes must be backported until 9.0 is ready to diverge. This commit adds a test which ensures transport versions are densely packed, ie there are no gaps at the granularity the version id is bumped (multiples of 1000).	2024-09-20 06:56:36 -07:00
David Turner	6ff138f558	Drop useless `AckedRequest` interface (#113255 ) Almost every implementation of `AckedRequest` is an `AcknowledgedRequest` too, and the distinction is rather confusing. Moreover the other implementations of `AckedRequest` are a potential source of `null` timeouts that we'd like to get rid of. This commit simplifies the situation by dropping the unnecessary `AckedRequest` interface entirely.	2024-09-20 12:33:07 +01:00
Armin Braun	8a179ff69e	Speedup RecyclerBytesStreamOutput.writeString (#113241 ) This method is quite hot in some use-cases because it's used by most string writing to transport messages. Overriding teh default implementation for cases where we can write straight to the page instead of going through an intermediary buffer speeds up the method by more than 2x, saving lots of cycles, especially on transport threads.	2024-09-20 11:50:36 +02:00
Armin Braun	8a94037421	Fix unnecessary context switch in RankFeaturePhase (#113232 ) If we don't actually execute this phase we shouldn't fork the phase unnecessarily. We can compute the RankFeaturePhaseRankCoordinatorContext on the transport thread and move on to fetch without forking. Fetch itself will then fork and we can run the reduce as part of fetch instead of in a separte search pool task (this is the way it worked up until the recent introduction of RankFeaturePhase, this fixes that regression).	2024-09-20 11:38:05 +02:00
Luca Cavanna	ae7cd5e008	Replace MappedFieldType#extractTerm with local query visitor (#113163 ) The only usage of `MappedFieldType#extractTerm` comes from `SpanTermQueryBuilder` which attempts to extract a single term from a generic Query obtained from calling `MappedFieldType#termQuery`. We can move this logic directly within its only caller, and instead of using instanceof checks, we can rely on the query visitor API. This additionally allows us to remove one of the leftover usages of TermInSetQuery#getTermData which is deprecated in Lucene	2024-09-20 09:55:36 +02:00
Mary Gouseti	f4f075a2cc	Add failure store status in index response of data streams (#112816 ) The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way: When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail: - when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`. - when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode. We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are: - `used`: meaning this document was indexed in the failure store - `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable. - `failed`: meaning this failed document, failed to be stored in the failure store. Example: ``` "errors": true, "took": 202, "items": [ { "create": { "_index": ".fs-my-ds-2024.09.04-000002", "_id": "iRDDvJEB_J3Inuia2zgH", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 6, "_primary_term": 1, "status": 201, "failure_store": "used" } }, { "create": { "_index": "ds-no-fs", "_id": "hxDDvJEB_J3Inuia2jj3", "status": 400, "error": { "type": "document_parsing_exception", "reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'", "caused_by": { "type": "illegal_argument_exception", "reason": "For input string: \"bla\"" } } }, "failure_store": "not_enabled" }, { "create": { "_index": ".ds-my-ds-2024.09.04-000001", "_id": "iBDDvJEB_J3Inuia2jj3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 7, "_primary_term": 1, "status": 201 } } ] ```	2024-09-20 10:53:39 +03:00
Armin Braun	49183d6d3a	Some improvements to ES812PostingsReader (#107354 ) Removing some fields that need not exist from this class as well as cleaning up two redundant conditions and adding missing `final`.	2024-09-20 09:30:26 +02:00
Keith Massey	ec50aaa835	Making transport changes to enable component template substitutions in the simulate ingest API (#113063 )	2024-09-19 17:42:51 -05:00
David Turner	33a366a256	Add extra context to `TransportNodesAction` invocations (#113140 ) Several `TransportNodesAction` implementations do some kind of top-level computation in addition to fanning out requests to individual nodes. Today they all have to do this once the node-level fanout is complete, but in most cases the top-level computation can happen in parallel with the fanout. This commit adds support for an additional `ActionContext` object, created when starting to process the request and exposed to `newResponseAsync()` at the end, to allow this parallelization. All implementations use `(Void) null` for this param, except for `TransportClusterStatsAction` which now parallelizes the computation of the cluster-state-based stats with the node-level fanout.	2024-09-19 17:33:38 +01:00
Christoph Büscher	c56bd7e8e1	NestedHelper shouldn't use deprecated TermInSetQuery method (#113171 ) The getTermData method in TermInSetQuery is deprecated and not needed for what we do in NestedHelper. We can remove its use by using other provided methods.	2024-09-19 17:36:37 +02:00
Kostas Krikellas	e244216c0f	Configure keeping source in FieldMapper (#112706 ) Introduces per-field param `synthetic_source_keep` that overrides the behavior for keeping the field source in synthetic source mode: - `none` : no source is stored - `arrays`: the incoming source is recorded as-is for arrays of a given field - `all`: the incoming source is recorded as is for both singleton and array values of a given field Related to #112012	2024-09-19 23:29:09 +10:00
Kostas Krikellas	4ff4384550	Retrieve the source for objects and arrays within arrays in a separate parsing phase (#113027 ) In synthetic source, storing array elements to `_ignored_source` may hide other, regular elements from showing up during source synthesizing. This is due to contents from `_ignored_source` taking precedence over matching fields from regular source loading. To avoid this, arrays are pre-emptively tracked and marked for source storing, if any of their elements needs to store its source. A second doc parsing phase is introduced that checks for fields missing values and records their source, while skipping objects and arrays that don't contain any such fields. Fixes #112374	2024-09-19 20:07:31 +10:00
Armin Braun	65076f4525	Use InfoStream.NO_OUTPUT in PersistedClusterStateService (#112941 ) We can save some allocations and speedup tests some more by using `InfoStream.NO_OUTPUT` instead of the default `NullInfoStream` that does not log but causes signals enabled to its users and thus causes large strings to needlessly be setup.	2024-09-19 11:22:24 +02:00
Yang Wang	e7fea24387	No need for PostWriteRefresh to wait for commit durability (#113075 ) Nodes can communicate directly for new commits with RCO. It is no longer necessary for PostWriteRefresh to wait for the special commit durability. In fact, it is wrong to do that because durability is not triggered for every commit with RCO. This PR removes the wait. Relates: ES-8774	2024-09-19 13:34:33 +10:00
Tim Brooks	529d349a25	Fix spotless in netty stream class Spotless broke during a rebase. Fixing in this commit.	2024-09-18 13:59:12 -06:00
Tim Brooks	92daeeba11	Properly handle empty incremental bulk requests (#112974 ) This commit ensures we properly throw exceptions when an empty bulk request is received with the incremental handling enabled.	2024-09-18 13:52:10 -06:00
Mikhail Berezovskiy	dce8a0bfd3	merge main	2024-09-18 13:52:10 -06:00
Tim Brooks	58e3a39392	Ensure partial bulks released if channel closes (#112724 ) Currently, the entire close pipeline is not hooked up in case of a channel close while a request is being buffered or executed. This commit resolves the issue by adding a connection to a stream closure.	2024-09-18 13:52:09 -06:00
Tim Brooks	95b42a7129	Ensure incremental bulk setting is set atomically (#112479 ) Currently the rest.incremental_bulk is read in two different places. This means that it will be employed in two steps introducing unpredictable behavior. This commit ensures that it is only read in a single place.	2024-09-18 13:40:39 -06:00
Tim Brooks	a03fb12b09	Incremental bulk integration with rest layer (#112154 ) Integrate the incremental bulks into RestBulkAction	2024-09-18 13:40:39 -06:00
Tim Brooks	c00768a116	Split bulks based on memory usage (#112267 ) This commit splits bulks once memory usage for indexing pressure has passed a configurable threshold.	2024-09-18 13:40:39 -06:00
Mikhail Berezovskiy	cbcbc34863	release stream chunk queue on bad request (#112227 )	2024-09-18 13:40:39 -06:00
Tim Brooks	478baf1459	Allow incremental bulk request execution (#111865 ) Allow a single bulk request to be passed to Elasticsearch in multiple parts. Once a certain memory threshold or number of operations have been received, the request can be split and submitted for processing.	2024-09-18 13:40:37 -06:00
Mikhail Berezovskiy	5e1f6554a2	Add http request content stream support (#111438 )	2024-09-18 13:38:36 -06:00
David Turner	079d680319	Revert "Add extra context to `TransportNodesAction` invocations (#113086 )" This reverts commit `3fdc8ef554`.	2024-09-18 19:28:38 +01:00
David Turner	3fdc8ef554	Add extra context to `TransportNodesAction` invocations (#113086 ) Several `TransportNodesAction` implementations do some kind of top-level computation in addition to fanning out requests to individual nodes. Today they all have to do this once the node-level fanout is complete, but in most cases the top-level computation can happen in parallel with the fanout. This commit adds support for an additional `ActionContext` object, created when starting to process the request and exposed to `newResponseAsync()` at the end, to allow this parallelization. All implementations use `(Void) null` for this param, except for `TransportClusterStatsAction` which now parallelizes the computation of the cluster-state-based stats with the node-level fanout.	2024-09-18 19:07:26 +01:00
David Turner	f437f13cb1	Fix invalid CS in `DiskThresholdDeciderTests` (#113024 ) Adds `fooRouting` to the routing table because it's not valid to check it against an `AllocationDecider` otherwise. But then add another allocation decider to prevent this shard from being assigned to a node by the `reroute()` call, and set the shards' in-sync IDs up correctly to make the cluster state valid enough to reroute.	2024-09-18 17:28:58 +01:00
Luca Cavanna	73c40b9567	Remove legacy validation of search source in data nodes (#113081 ) We moved the validation of incoming search requests to data nodes with #105150. The legacy validation performed on the data nodes was left around for bw comp reasons, as there could still be coordinating nodes in the cluster not performing that validation. This is no longer the case in main. This commit removes the validation in favour of validation already performed while coordinating the search request. Relates to #105150	2024-09-18 17:05:19 +02:00
Saikat Sarkar	0d79a698b3	Integrate IBM watsonx to Inference API for text embeddings (#111770 ) * Resolve merge conflicts * Log the exception if Bearer token generation fails * Set rate limit * Add tests * Apply spotless * Add test for ServiceSettings * Add test for EmbeddingsRequestEntity * Add test for IbmWatsonxEmbeddingsRequestEntity * Apply spotless * Add tests for IbmWatsonxEmbeddingsResponseEntity * Fix the issue with long line * Fix tests for IbmWatsonxEmbeddingsActionTests * Apply spotless * Resolve merge conflicts * Move project_id from ServiceFields to IbmWatsonxServiceFields * Check 400 Bad Request * Avoid logging exception since this may contain the bearer token * Throw an exception if the creation of Bearer token fails * Throw exception based on the status code for generating Bearer token * Revert "Throw exception based on the status code for generating Bearer token" This reverts commit f3cd615b8eee1f39536175dee7fd4588a691f319. * Delete .java-version file * Fix test * Update docs/changelog/111770.yaml * Use IOException instead of Exception * Resolve merge conflicts * Fix the tests * Add end-to-end test and infer test	2024-09-18 08:32:47 -06:00
Ignacio Vera	01b45a5bbc	Reduce heap usage for AggregatorsReducer (#112874 ) This commit reduces heap usage by sizing properly the hashmap containing the aggregations by name.	2024-09-18 16:15:42 +02:00
Simon Cooper	c311515684	Create a fluent builder to help implement ChunkedToXContent (#112389 ) Rather than manually adding startObject/endObject, and having to line everything up manually, this handles the start/end for you. A few implementations are converted already. In the long run, I would like this to replace ChunkedXContentHelper.	2024-09-18 14:59:20 +01:00
Pooya Salehi	1e0ed081cf	Unmute and fix PrevalidateShardPathIT#testCheckShards (#113107 ) It seems that the hectic retrying with a new cluster state update just makes things worse. The initial deletion after the relocation might take a bit, I assume also more visible in this test because we've made shard close async in #108145. After that we just check once and if the shard dir is there we keep pushing new cluster states and checking again and this keeps failing with the check mentioned here. I've picked a simple solution since this is a test issue and just check a bit longer before triggering the new cluster state update. I've looked into a couple of other hooks (e.g. IndexFoldersDeletionListener#beforeIndexFoldersDeleted and org.elasticsearch.indices.cluster.IndicesClusterStateService#onClusterStateShardsClosed ) to see if we could rely on them rather than the assertBusy used here. None unfortunately seem to be cleanly allow getting rid of the assertBusy. IMO, since the shard store deletion is retried and guarantees to eventually work, to avoid flaky tests, we should still keep relying on the retries initiated by the cluster state update. I'll keep the issue open for a while before removing the extra logging. Running it locally has not failed. Relates #111134	2024-09-18 15:47:12 +02:00
Armin Braun	f51fcd3b06	Introduce constant for all production use of Base64.getUrlEncoder().withoutPadding() (#112899 ) Mostly not a big win, but we use this in a couple spots where it shows up in profiling, no need to have this kind of allocation in e.g. `RandomBasedUUIDGenerator` where it's somewhat hot.	2024-09-18 15:03:06 +02:00
Armin Braun	e5bcb0c5b3	Remove duplication in settings code and some minor setting speedups (#112897 ) Some small speedups in here from pre-evaluating `isFiltered(properties)` in lots of spots and not creating an unused `SimpleKey` in `toConcreteKey` which runs a costly string interning at some rate. Other than that, obvious deduplication using existing utilities or adding obvious missing overloads for them.	2024-09-18 15:01:49 +02:00
Salvatore Campagna	91674f85a0	Fix `ignore_above` for flattened fields (#112944 ) Flattened fields do no actually use `ignore_above`. When retrieving source, `ignore_above` behaves as expected. Anyway, it is ignored when fetching field values through the fields API. In that case values are returned no matter the `ignore_above` setting. Here we implement the filtering logic in the `ValueFecther` implementation of `RootFlattenedFieldType`.	2024-09-18 14:53:06 +02:00
David Turner	d0a7d07fa3	Remove `es.transport.cname_in_publish_address` sysprop warning (#113087 ) This propery has been a deprecated no-op since #45662 (8.0.0). This commit removes it entirely in 9.0.0.	2024-09-18 21:13:15 +10:00
Yang Wang	99b5ed87c2	Check latest repoData before applying workaround for missing shardGen file (#112979 ) It is expected that the old master may attempt to read a shardGen file that is deleted by the new master. This PR checks the latest repo data before applying the workaround (or throwing AssertionError) for missing shardGen files. Relates: #112337 Resolves: #112811	2024-09-18 19:32:16 +10:00
David Turner	1c0bab1b6c	Remove `PERMIT_HANDSHAKES_FROM_INCOMPATIBLE_BUILDS_KEY` block (#113041 ) This sysprop has been forbidden for all of 8.x, we don't need to even think about it in 9.x. This commit removes it.	2024-09-18 09:24:26 +01:00
Luca Cavanna	452a4d2b7d	Remove version dependent logic from CompletionFieldMapper (#113011 ) We have version based logic that applies the limit to number of completion contexts only to indices created from 8.0 on. Those are the only indices we can now have in a cluster, hence we can remove the version based conditional. Relates to #38675	2024-09-18 10:20:30 +02:00
Luca Cavanna	de5b00d012	Remove version based logic from IpFieldMapper and DateFieldMapper (#113023 ) We have logic that throws exception when parsing empty null_value for ip and date field mapper only for indices created from 8.0 and onwards. This conditional can now be removed.	2024-09-18 10:11:25 +02:00
Luca Cavanna	bb78a28c4a	Remove index.mapper.dynamic setting (#113000 ) This setting had been removed in the past, it was reintroudced for bw comp with 7.x with #109341. It can now be removed from main as it no longer supports indices created with 7.x	2024-09-18 10:10:44 +02:00
Armin Braun	1f4262e5c5	Lazy load stats in NodeMetrics (#113014 ) No point in eager loading the metrics here, it just slows down node startup considerably, wasting lots of time in integTest runs in particular (as in e.g. ~200s of total CPU time in :server:internalClusterTest!!).	2024-09-18 10:04:53 +02:00
Luca Cavanna	3ab361ac83	Remove version dependent logic from ShingleTokenFilterFactory (#113019 ) This commit removes conditionals for logic that no longer needs to be version dependent, in that all indices in the cluster will match such condition. Relates to #27211,#34331	2024-09-18 09:10:13 +02:00
Luca Cavanna	84b78fae77	Remove version based logic from IndexSortConfig (#113015 ) IndexSortConfig has some special logic to prevent configure index sorting targeting alias fields. This returns an error for indices created on or after 7.13. Such logic can now be removed as all indices will fall under that condition. Relates to #70879	2024-09-18 09:09:32 +02:00
Nhat Nguyen	af7ed9515f	Enable ignore_malformed in logsdb (#113072 ) This change enables ignore_malformed by default for newly created logsdb indices. Closes #106822	2024-09-17 22:41:41 -07:00

1 2 3 4 5 ...

14572 commits