elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Luca Cavanna	905222613a	Disable concurrency when top_hits sorts on anything but _score (#123610 ) We already disable inter-segment concurrency in SearchSourceBuilder whenever the top-level sort provided is not _score. We shoudl apply the same rules in top_hits. We recenly stumbled upon non deterministic behaviour caused by script sorting defined within top hits. That is to be expected given that script sorting does not support search concurrency. The sort script can be replaced with a runtime field, either defined in the mapping or in the search request, which does support concurrency and guarantees predictable behaviour.	2025-02-27 21:22:17 +01:00
Colleen McGinnis	b7e3a1e14b	[docs] Migrate docs from AsciiDoc to Markdown (#123507 ) * delete asciidoc files * add migrated files * fix errors * Disable docs tests * Clarify release notes page titles * Revert "Clarify release notes page titles" This reverts commit `8be688648d`. * Comment out edternal URI images * Clean up query languages landing pages, link to conceptual docs * Add .md to url * Fixes inference processor nesting. --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Co-authored-by: Liam Thompson <leemthompo@gmail.com> Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com> Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>	2025-02-27 17:56:14 +01:00
Yang Wang	c7e7dbe904	Abort pending deletion on IndicesService stop (#123569 ) When IndicesService is closed, the pending deletion may still be in progress due to indices removed before IndicesService gets closed. If the deletion stucks for some reason, it can stall the node shutdown. This PR aborts the pending deletion more promptly by not retry after IndicesService is stopped. Resolves: #121717 Resolves: #121716 Resolves: #122119	2025-02-27 23:43:53 +11:00
Iván Cea Fontenla	ca5d251807	ESQL: Fix function registry concurrency issues on constructor (#123492 ) Fixes https://github.com/elastic/elasticsearch/issues/123430 There were 2 problems here: - We were filling a static field (used to auto-cast string literals) within a constructor, which is also called in multiple places - The field was only filled with non-snapshot functions, so snapshot function auto-casting wasn't possible Fixed both bugs by making the field non-static instead, and a fix to use the snapshot registry (if available) in the string casting rule.	2025-02-27 11:05:18 +01:00
Costin Leau	e4604a4432	ESQL: Reduce iteration complexity for plan traversal (#123427 )	2025-02-26 08:30:58 -08:00
David Turner	4be53f50f7	Small resiliency status update (#123497 )	2025-02-27 01:49:16 +11:00
Joe Gallo	af6014ecb5	Use ordered maps for PipelineConfiguration xcontent deserialization (#123403 )	2025-02-25 15:20:01 -05:00
Keith Massey	88cf2487e7	Fixing serialization of ScriptStats cache_evictions_history (#123384 )	2025-02-25 16:46:22 +00:00
Kathleen DeRusso	ae6474db63	Deprecate Behavioral Analytics CRUD apis (#122960 ) * Deprecate Behavioral Analytics CRUD APIs * Add allowed warning for REST Compatibility tests * Update docs/changelog/122960.yaml * Update changelog * Update docs to add deprecation flags and fix failing tests * Update changelog * Update changelog again * Update docs formatting Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Skip asciidoc test --------- Co-authored-by: Efe Gürkan YALAMAN <efeyalaman@gmail.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Co-authored-by: Efe Gürkan YALAMAN <efeguerkan.yalaman@elastic.co>	2025-02-25 16:02:50 +01:00
Ying Mao	e8438490ea	Updates to allow using Cohere binary embedding response in semantic search queries. (#121827 ) * wip * wip * [CI] Auto commit changes from spotless * updating tests * [CI] Auto commit changes from spotless * Update docs/changelog/121827.yaml * Updates after the refactor * [CI] Auto commit changes from spotless * Updating error message --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-02-25 09:14:20 -05:00
Martijn van Groningen	6c55099784	Store arrays offsets for ip fields natively with synthetic source (#122999 ) Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.	2025-02-25 13:42:41 +00:00
David Turner	d0db4cd085	Reduce licence checks in `LicensedWriteLoadForecaster` (#123346 ) Rather than checking the license (updating the usage map) on every single shard, just do it once at the start of a computation that needs to forecast write loads. Closes #123247	2025-02-25 23:50:43 +11:00
Craig Taverner	ec82c24a87	Add support to VALUES aggregation for spatial types (#122886 ) The original work at https://github.com/elastic/elasticsearch/pull/106065 did not support geospatial types with this comment: > I made this work for everything but geo_point and cartesian_point because I'm not 100% sure how to integrate with those. We can grab those in a follow up. The geospatial types should be possible to collect using the VALUES aggregation with similar behavior to the `ST_COLLECT` OGC function, based on the Elasticsearch convention that treats multi-value geospatial fields as behaving similarly to any geometry collection. So this implementation is a trivial addition to the existing values types support.	2025-02-25 11:38:51 +01:00
Joe Gallo	6315b8a8aa	Register IngestGeoIpMetadata as a NamedXContent (#123079 )	2025-02-24 17:25:25 -05:00
Samiul Monir	5664f4f2ba	Improved error message when index field type is unknown (#122860 ) * Updating error message when index field type is unknown * Fix style issue * Add yaml test for invalid field type error message * Update docs/changelog/122860.yaml * Updating error message for runtime and multi field type parser * add and fix yaml tests * Fix code styles by running spotlessApply * Update changelog * Updatig the test in yml * Updating error message for runtime * Fix failing yaml tests * Update error message to Fix unit tests * fix serverless qa test --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2025-02-24 13:16:22 -05:00
Pat Whelan	4c3ceae986	[ML] Set Connect Timeout to 5s (#123272 ) Reduced connection timeout from infinite to a system configurable setting that defaults to 5s. Increased EIS auth token timeout from 30s to 1m.	2025-02-24 18:03:15 +00:00
Nhat Nguyen	4d2b8dc4f2	Fix early termination in LuceneSourceOperator (#123197 ) The LuceneSourceOperator is supposed to terminate when it reaches the limit; unfortunately, we don't have a test to cover this. Due to this bug, we continue scanning all segments, even though we discard the results as the limit was reached. This can cause performance issues for simple queries like FROM .. \| LIMIT 10, when Lucene indices are on the warm or cold tier. I will submit a follow-up PR to ensure we only collect up to the limit across multiple drivers.	2025-02-24 08:49:54 -08:00
Andrei Dan	760b2312ea	Periodically check the available memory when fetching search hits source (#121920 ) When fetching documents, sometimes we need to load the entire source of search hits. Document sources can be large, and with support for up to 10k hits per search request, this creates a significant untracked memory load on Elasticsearch that can potentially cause out-of-memory errors. This PR adds memory checking for hits source in the fetch phase. We check with the parent (the real memory) circuit breaker every 1MiB of loaded source and when fetching the last document of every segment. This gives the real memory breaker a chance to interrupt running operations when we're running low on memory, and prevent potential OOMs. The amount of local accounting to buffer is controlled by the `search.memory_accounting_buffer_size` dynamic setting and defaults to `1MiB`. Fixes #89656	2025-02-25 03:25:14 +11:00
jeffganmr	22103de150	fix stale data in synthetic source for string stored field (#123105 )	2025-02-24 07:22:48 -08:00
Iván Cea Fontenla	c40c5a6c0a	ESQL: Fix functions emitting warnings with no source (#122821 ) Fixes https://github.com/elastic/elasticsearch/issues/122588 - Replaced `Source.EMPTY.writeTo(out)` to `source().writeTo(out)` in functions emitting warnings - Did the same on all aggs, as Top emits an error on type resolution. This is not a bug, as type resolution errors should only happen in the coordinator. Another option would be changing Top to not generate that error there, and make it implement instead `PostAnalysisVerificationAware` - In some cases, we don't even serialize an empty source. So I had to add a new `TransportVersion` to do so - As an special case, `ToLower` and `ToUpper` weren't serializing a source, but they don't emit warnings. As they were the only remaining functions not serializing the source, I added it there too	2025-02-24 13:52:41 +00:00
David Turner	187b192dfe	Deduplicate allocation stats calls (#123246 ) These things can be quite expensive and there's no need to recompute them in parallel across all management threads as done today. This commit adds a deduplicator to avoid redundant work.	2025-02-25 00:21:10 +11:00
Nik Everett	67293ba8f4	ESQL: Speed up VALUES for many buckets (#123073 ) Speeds up the VALUES agg when collecting from many buckets. Specifically, this speeds up the algorithm used to `finish` the aggregation. Most specifically, this makes the algorithm more tollerant to large numbers of groups being collected. The old algorithm was `O(n^2)` with the number of groups. The new one is `O(n)` ``` (groups) 1 219.683 ± 1.069 -> 223.477 ± 1.990 ms/op 1000 426.323 ± 75.963 -> 463.670 ± 7.275 ms/op 100000 36690.871 ± 4656.350 -> 7800.332 ± 2775.869 ms/op 200000 89422.113 ± 2972.606 -> 21920.288 ± 3427.962 ms/op 400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op ``` The `1` group version was not changed at all. That's just noise in the measurement. The small bump in the `1000` case is almost certainly worth it and real. The huge drop in the `100000` case is quite real.	2025-02-23 18:29:55 +00:00
Sam Xiao	4233310846	Add health indicator impact to HealthPeriodicLogger (#122390 )	2025-02-21 17:06:25 -05:00
Alexey Ivanov	2bda4c1fa8	Converting an Existing Data Stream to a System DataStream is Broken (#121392 ) Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task	2025-02-21 19:50:57 +00:00
Pat Whelan	bd52363bde	[ML] Add ElasticInferenceServiceCompletionServiceSettings (#123155 ) Adding the missing NamedWriteable to the registry.	2025-02-21 12:27:12 -05:00
Costin Leau	21845ad7a1	ESQL: Remove duplicated nested commands (#123085 ) Fork grammar duplicated nested command declaration causing additional lexing to occur resulting in invalid field name declaration Relates to #121948	2025-02-21 06:56:09 -08:00
Joe Gallo	a8958755a7	Fix geoip databases index access after system feature migration (again) (#122938 )	2025-02-21 08:00:10 -05:00
Martijn van Groningen	8d1f5d3223	Hold store reference in InternalEngine#performActionWithDirectoryReader(...) (#123010 ) This method gets called from `InternalEngine#resolveDocVersion(...)`, which gets during indexing (via `InternalEngine.index(...)`). When `InternalEngine.index(...)` gets invoked, the InternalEngine only ensures that it holds a ref to the engine via Engine#acquireEnsureOpenRef(), but this doesn't ensure whether it holds a reference to the store. Closes #122974 * Update docs/changelog/123010.yaml	2025-02-21 11:48:21 +01:00
Fang Xing	412e6c2b39	[ES\|QL] Implicit numeric casting for CASE/GREATEST/LEAST (#122601 ) * implicit numeric casting for conditional functions	2025-02-20 22:20:49 -05:00
kanoshiou	de41d5704b	ESQL: Fix precision of `scaled_float` field values retrieved from stored source (#122586 )	2025-02-20 14:01:34 -08:00
fzowl	521f8554c3	feat: VoyageAI integration (#122134 ) * VoyageAI embeddings and rerank: - embeddings works, tested - initial rerank code What's missing: - unit and integration tests - rerank request/response mapping and verification * VoyageAI embeddings and rerank: - embeddings works, tested - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank) What's missing: - unit and integration tests * VoyageAI embeddings and rerank: - embeddings works, tested - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank) What's missing: - unit and integration tests * VoyageAI embeddings and rerank: - embeddings works, tested - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank) What's missing: - unit and integration tests * Adding initial tests Moving dimensions to ServiceSettings * Correcting the TransportVersions.java * Correcting due to comments * Adding BIT support * Initial tests * More tests * More tests/corrections * Removing warnings * Further tests * Transport version correction * Adding changelog and correcting TransportVersions * Spotless tests * Changes due to the comments * Changes due to the comments * Correcting QA tests * Correcting QA tests --------- Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>	2025-02-20 16:11:58 -05:00
Ruben van Staden	171a3b93f9	apm-data: use representative count as event.success_count if available (#119995 )	2025-02-20 14:45:06 -05:00
Dan Rubinstein	99897b1b39	Add enterprise license check to inference action for semantic text fields (#122293 ) * Add enterprise license check to inference action for semantic text fields * Update docs/changelog/122293.yaml * Set license to trial in ShardBulkInferenceActionFilterIT * Move license check to only block semantic_text fields that require inference call * Cleaning up tests * Add parameterization on useLegacyFormat back in ShardBulkInferenceActionFilterBasicLicenseIT --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2025-02-20 14:06:40 -05:00
Kostas Krikellas	5c129786f1	Use min node version to guard injecting settings in logs provider (#123005 ) * Use min node version to guard injecting settings in logs provider * Update docs/changelog/123005.yaml * no random in cluster init	2025-02-20 18:31:16 +02:00
Keith Massey	41dae025e7	Updating TransportRolloverAction.checkBlock so that non-write-index blocks do not prevent data stream rollover (#122905 )	2025-02-20 17:20:44 +01:00
Nhat Nguyen	091ea9aa1d	Support partial results in CCS in ES\|QL (#122708 ) A follow-up to #121942 that adds support for partial results in CCS in ES\|QL. Relates #121942	2025-02-20 07:27:32 -08:00
Luke Whiting	e3792d19b5	Allow data stream reindex tasks to be re-run after completion (#122510 ) * Allow data stream reindex tasks to be re-run after completion * Docs update * Update docs/reference/migration/apis/data-stream-reindex.asciidoc Co-authored-by: Keith Massey <keith.massey@elastic.co> --------- Co-authored-by: Keith Massey <keith.massey@elastic.co>	2025-02-20 15:03:51 +00:00
Ioana Tagirta	a26b596cbd	ES\|QL: Initial grammar and changes for FORK (snapshot) (#121948 ) * Grammar changes * Generate grammar changes * Fork planning * Fix field resolution * Cleanup * Add CSV tests * Update docs/changelog/121948.yaml * [CI] Auto commit changes from spotless * fix forbidden apis * javadoc * remove serialization of fork and Merge * fix equality * fix EsqlNodeSubclassTests * add statement parser tests * remove unnecessary serialization * automatic fork branch ids start at 1 * add analyzer test * more tests * more tests * minor itr * replace [] with () * move fork eval to initial logical plan * simplify MergeOperator finished state * enable CVS tests * rework Fork to use StubRelation and Merge to be Nary * reverts * fail hard if not LocalSourceExec * spotless * no fork in fork yet * itr * itr * itr * fix EsqlNodeSubclassTests * more tests and restrict NESTED_XX to snapshot * fix method name * check for fork cap before testing ForkIT * Move fork id alias logic to parser --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com>	2025-02-20 13:25:08 +01:00
David Turner	cdaa5dd7ad	Clarify breaking change note for #112903 (#122998 ) Closes #122994	2025-02-20 12:11:56 +00:00
Larisa Motova	e4ee91a08a	[ES\|QL] Render aggregate_metric_double (#122660 ) This commit allows users to read aggregate_metric_double fields from indices in ES\|QL, with any subset of metrics.	2025-02-19 22:38:49 -10:00
Martijn van Groningen	43665f0a35	Store arrays offsets for keyword fields natively with synthetic source (#113757 ) The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field. For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"]. Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2] Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1] Empty arrays are also supported by encoding a zigzag vint array of zero elements. Limitations: currently only doc values based array support for keyword field mapper. multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage. With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).	2025-02-20 09:20:49 +01:00
Keith Massey	463dc4a8a5	Updates the deprecation info API to not warn about system indices and data streams (#122951 )	2025-02-19 15:30:17 -06:00
Dan Rubinstein	bea8df3c8e	Adding endpoint creation validation to ElasticInferenceService (#117642 ) * Adding endpoint creation validation to ElasticInferenceService * Fix unit tests * Update docs/changelog/117642.yaml --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2025-02-19 12:24:21 -05:00
Dianna Hohensee	2bfc700683	DesiredBalanceReconciler always returns AllocationStats (#122458 ) Ensures that the DesiredBalanceReconciler always returns a non-empty AllocationStats object, eliminating edge cases where the stats available to DesiredBalanceMetrics may not be updated due to some kind of throttling or the balancer being disabled via cluster settings. Adds documentation around AllocationDecider#canRebalance(RoutingAllocation) Closes ES-10581	2025-02-19 10:34:24 -05:00
David Turner	cd15d09adf	Fork post-snapshot-delete cleanup off master thread (#122731 ) We shouldn't run the post-snapshot-delete cleanup work on the master thread, since it can be quite expensive and need not block subsequent cluster state updates. This commit forks it onto a `SNAPSHOT` thread.	2025-02-19 21:02:27 +11:00
Niels Bauman	c65596b62e	Run `TransportGetWatcherSettingsAction` on local node (#122857 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-02-19 08:15:00 +01:00
Lee Hinman	2ae80c799d	Allow setting the `type` in the reroute processor (#122409 ) * Allow setting the `type` in the reroute processor This allows configuring the `type` from within the ingest `reroute` processor. Similar to `dataset` and `namespace`, the type defaults to the value extracted from the index name. This means that documents sent to `logs-mysql.access.default` will have a default value of `logs` for the type. Resolves #121553 * Update docs/changelog/122409.yaml	2025-02-18 12:38:00 -07:00
Dianna Hohensee	befc6a03e3	Start the allocation architecture guide section (#121940 ) This is a high-level overview of the main rebalancing components and how they interact to move shards around the cluster, and decide where shards should go. Relates ES-10423	2025-02-18 13:33:39 -05:00
Felix Barnsteiner	5e8865deac	Add _metric_names_hash field to OTel metric mappings (#120952 ) If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate. The _metric_names_hash field will be set by the OTel ES exporter. As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics. The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created. That has an impact on the rate aggregation for counters.	2025-02-18 18:30:37 +01:00
Oleksandr Kolomiiets	ba8c5764f8	Use FallbackSyntheticSourceBlockLoader for unsigned_long and scaled_float fields (#122637 )	2025-02-18 09:28:26 -08:00

1 2 3 4 5 ...

17719 commits