elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 09:28:55 -04:00

Author	SHA1	Message	Date
Niels Bauman	8028d5adde	Fix cat allocation YAML test (#126003 ) This test failed when the `disk.indices.forecast` value was a decimal number. We adjust the regex to allow decimal values and for consistency we also allow negative values. Fixes #125711 Fixes #125848 Fixes #125661	2025-04-01 11:25:13 +01:00
Benjamin Trent	505f21ba42	Simplify tests, bypassing raw score test (#125877 ) I was debating on having this tests in the original PR anyways. It ain't worth the flakiness. We know the oversampling setting gets updated given the other tests. closes: https://github.com/elastic/elasticsearch/issues/125851	2025-03-31 23:49:29 +11:00
Armin Braun	fd2cc97541	Introduce batched query execution and data-node side reduce (#121885 ) This change moves the query phase a single roundtrip per node just like can_match or field_caps work already. A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node. As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node! Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.	2025-03-29 16:53:18 +01:00
Carlos Delgado	968bddc462	Non existing synonyms sets do not fail shard recovery (#125659 )	2025-03-27 18:04:20 +02:00
Benjamin Trent	d84eb1f53f	Update bbq test data to better distinguish docs (#125705 ) Adjust the test data. I verified that the scores are now more distinguishable when: - each doc has its own segment - when 1 & 2 are in the same segment but 3 is alone - 2 & 3 in the same segment but 1 alone - 1 & 3 in the same segment but 2 alone - all three in the same segment closes: https://github.com/elastic/elasticsearch/issues/123727 closes: https://github.com/elastic/elasticsearch/issues/124848	2025-03-28 00:12:56 +11:00
Benjamin Trent	dd58b0b6fa	Return appropriate error on null dims update instead of npe (#125716 ) Calling `Object::toString` was trying to call `null.toString()`, really it should have been `Objects::toString`, which accepts `null`. closes: https://github.com/elastic/elasticsearch/issues/125713	2025-03-27 08:47:20 +11:00
Benjamin Trent	009a86a0e3	Allow zero for rescore_vector.oversample to indicate by-passing oversample and rescoring (#125599 ) This allows a `rescore_vector: {oversample: 0}` to indicate bypassing oversampling and rescoring. This is useful for: - Updating a quantized mapping to turn off automatic rescoring - Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping closes: https://github.com/elastic/elasticsearch/issues/125157	2025-03-27 06:56:51 +11:00
Stanislav Malyshev	07921a78a6	Handle long overflow in dates (#124048 ) * Handle long overflow in dates	2025-03-26 18:57:04 +02:00
Niels Bauman	fdd453734d	Fix NPE in rolling over unknown target and return 404 (#125352 ) Since #122905 we were throwing NPEs (i.e. 5xxs) when a rollover request has an unknown/non-existent target. Before that, we returned a 400 - illegal argument exception. We now return a 404 which matches "missing target" better. Additionally, to avoid this from happening again, we add a YAML test that asserts the correct exception behavior.	2025-03-22 12:59:13 +02:00
Lisa Cawley	97c5d4e149	Add more inference API REST specifications (#125187 )	2025-03-21 09:44:37 +02:00
Benjamin Trent	e9c4b267c2	Adjusting 41_knn_search_bbq_hnsw tests to have explicit refresh (#125255 )	2025-03-20 17:15:05 -04:00
Tommaso Teofili	6d3dac32c6	Let random_score yaml test explicitly fail on _id field (#125230 ) * constrain the no-field scenario to 9.x	2025-03-20 14:16:02 +01:00
István Zoltán Szabó	8a741bfd62	Adds VoyageAI PUT Inference API. (#125198 )	2025-03-19 13:29:14 +01:00
Quentin Pradet	7070f3fdbe	Add missing cause param to indices.put_template API (#125189 )	2025-03-19 14:57:30 +04:00
Lisa Cawley	b5bc681191	Add Mistral inference API (#125063 )	2025-03-18 22:11:12 -07:00
István Zoltán Szabó	c3222aba74	Adds EIS inference PUT API (#125082 )	2025-03-18 16:19:00 +01:00
Stanislav Malyshev	f0ee146f7f	Document allow_partial_results (#125044 )	2025-03-17 12:37:10 -06:00
Quentin Pradet	0bacede6cc	Add missing OpenAI and Watsonx inference APIs (#124989 )	2025-03-17 18:42:09 +04:00
Tommaso Teofili	51877bb33c	Add yaml test for random_score in function_score query (#124893 )	2025-03-17 10:59:01 +01:00
Niels Bauman	481d91c428	Run `TransportGetMappingsAction` on local node (#122921 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-03-15 07:59:28 +00:00
Luca Cavanna	05c8453b2b	Remove search throttled index setting and thread pool (#124519 ) Frozen indices, the freeze index API and the private index.frozen setting have been removed with #120539. There is also a search throttled thread pool that can now be removed, as well as a private search.throttled index settings that is no longer used as it could only be set internally by freezing an index. While the index setting is private and can be removed, as it should no longer be present in any index on 9.0+ indices, the thread pool settings associated to the removed pool are still accepted as no-op in case users have customized them and are upgrading without removing these. These will also trigger a deprecating warning. This change also removes the search.throttled related output from the thread pool section of the cluster info API.	2025-03-14 12:04:35 +01:00
Benjamin Trent	b2c1c4e0f0	New `vector_rescore` parameter as a quantized index type option (#124581 ) This adds a new parameter to the quantized index mapping that allows default oversampling and rescoring to occur. This doesn't adjust any of the defaults. It allows it to be configured. When the user provides `rescore_vector: {oversample: <number>}` in the query it will overwrite it. For example, here is how to use it with bbq: ``` PUT rescored_bbq { "mappings": { "properties": { "vector": { "type": "dense_vector", "index_options": { "type": "bbq_hnsw", "rescore_vector": {"oversample": 3.0} } } } } } ``` Then, when querying, it will auto oversample the `k` by `3x` and rerank with the raw vectors. ``` POST _search { "knn": { "query_vector": [...], "field": "vector" } } ```	2025-03-14 00:40:08 +11:00
Benjamin Trent	a1ee3c9291	Have create index return a bad request on poor formatting (#123761 ) closes: https://github.com/elastic/elasticsearch/issues/123661	2025-03-07 04:24:54 +11:00
Jonathan Buttner	2a006ec1f4	Updating description of stream API (#124209 )	2025-03-06 15:34:21 +01:00
Jonathan Buttner	3a472ebae9	[ML] Update inference api rest spec (#124151 ) * Pulling api spec changes * Fixing test and updating code javadoc	2025-03-06 08:26:49 -05:00
Benjamin Trent	a92b1d6892	Adjust exception thrown when unable to load hunspell dict (#123743 ) On index creation, its possible to configure an hunspell analyzer, but reference a locale file that actually doesn't exist or isn't accessible. This error, like our other user dictionary errors, should be an IAE not an ISE. closes: https://github.com/elastic/elasticsearch/issues/123729	2025-03-06 06:19:21 +11:00
David Kyle	c3e7493d7a	[ML] Remove deprecated routes for ml trained models APIs (#124019 ) The 7.x routes for ml trained models _ml/inference/ have been deprecated since 8 and replaced with _ml/trained_models. Also removes query parameters that are no longer supported.	2025-03-05 16:09:37 +00:00
Martijn van Groningen	26de5343a2	Remove synthetic recovery source feature flag. (#122615 ) This feature flag controls whether synthetic recovery source is enabled by default when the source mode is synthetic. The synthetic recovery source feature itself is already available via the index.recovery.use_synthetic_source index setting and can be enabled by anyone using synthetic source. The default value of index.recovery.use_synthetic_source setting defaults to true when index.mapping.source.mode is enabled. The index.mapping.source.mode default to true if index.mode is logsdb or time_series. In other words, with this change synthetic recovery source will be enabled by default for logsdb and tsdb. Closes #116726	2025-03-05 15:43:33 +01:00
Rene Groeschke	496c38e5a5	Reapply "Update Gradle wrapper to 8.13 (#122421 )" (#123889 ) (#123896 ) This reverts commit `36660f2e5f`.	2025-03-05 08:02:13 +01:00
Rene Groeschke	36660f2e5f	Revert "Update Gradle wrapper to 8.13 (#122421 )" (#123889 ) This reverts commit `e19b2264af`.	2025-03-03 15:51:07 +01:00
Rene Groeschke	e19b2264af	Update Gradle wrapper to 8.13 (#122421 ) * Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated. * Make use of new layout.settingsFolder api to address some cross project references * Fix buildParams snapshot check for multiprojet projects	2025-03-03 14:10:00 +01:00
Kathleen DeRusso	ae6474db63	Deprecate Behavioral Analytics CRUD apis (#122960 ) * Deprecate Behavioral Analytics CRUD APIs * Add allowed warning for REST Compatibility tests * Update docs/changelog/122960.yaml * Update changelog * Update docs to add deprecation flags and fix failing tests * Update changelog * Update changelog again * Update docs formatting Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Skip asciidoc test --------- Co-authored-by: Efe Gürkan YALAMAN <efeyalaman@gmail.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Co-authored-by: Efe Gürkan YALAMAN <efeguerkan.yalaman@elastic.co>	2025-02-25 16:02:50 +01:00
Samiul Monir	5664f4f2ba	Improved error message when index field type is unknown (#122860 ) * Updating error message when index field type is unknown * Fix style issue * Add yaml test for invalid field type error message * Update docs/changelog/122860.yaml * Updating error message for runtime and multi field type parser * add and fix yaml tests * Fix code styles by running spotlessApply * Update changelog * Updatig the test in yml * Updating error message for runtime * Fix failing yaml tests * Update error message to Fix unit tests * fix serverless qa test --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2025-02-24 13:16:22 -05:00
Quentin Pradet	17ed01471b	Add missing body to ML rest-api-spec API (#123235 )	2025-02-24 19:56:01 +04:00
Quentin Pradet	d8284fba1a	Fix cat APIs query parameters (#123020 )	2025-02-21 14:46:05 +04:00
Martijn van Groningen	43665f0a35	Store arrays offsets for keyword fields natively with synthetic source (#113757 ) The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field. For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"]. Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2] Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1] Empty arrays are also supported by encoding a zigzag vint array of zero elements. Limitations: currently only doc values based array support for keyword field mapper. multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage. With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).	2025-02-20 09:20:49 +01:00
Niels Bauman	618de4855d	Remove `local` param from field mapping API spec (#122922 ) The `local` param for the `GetFieldMapping` API was deprecated in #55014 and I think #57265 aimed to propogate that deprecation to the REST API spec, but it changed `get_mapping.json` instead of `get_field_mapping.json`. #55100 removed the `local` param for the _field_ mapping API so we can safely remove the field from the spec and remove the YAML test.	2025-02-19 22:51:11 +01:00
David Turner	cd15d09adf	Fork post-snapshot-delete cleanup off master thread (#122731 ) We shouldn't run the post-snapshot-delete cleanup work on the master thread, since it can be quite expensive and need not block subsequent cluster state updates. This commit forks it onto a `SNAPSHOT` thread.	2025-02-19 21:02:27 +11:00
Salvatore Campagna	780cac5a6d	Enable a sparse doc values index for `@timestamp` in LogsDB (#122161 ) This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB. Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met: * The index mode is LogsDB. * The field name is @timestamp. * Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not). * Doc values are enabled. This ensures that only one index structure is defined on the @timestamp field: * If the conditions above are met, the inverted index is replaced with a sparse doc values index. * This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead. This change aligns with our goal of optimizing LogsDB for storage efficiency while possibly maintaining reasonable query latency performance. It will enable us to run benchmarks and evaluate the impact of sparse indexing on the @timestamp field as well.	2025-02-17 13:31:26 +01:00
Jordan Powers	53150881e9	Enable the use of nested field type with index.mode=time_series (#122224 ) This patch removes the check that fails requests that attempt to use fields of type: nested within indices with mode time_series. This patch also updates TimeSeriesIdFieldMapper#postParse to set the _id field on child documents once it's calculated. Closes #120874	2025-02-13 09:33:04 -08:00
Benjamin Trent	f5c901e68c	Fix synthetic source bug that would mishandle nested dense_vector fields (#122425 ) When utilizing synthetic source with nested fields, we attempt to rebuild the child values in addition to all the parent values. While this generally works well, its potential that certain values might be missing from various child docs. Consequently, we will attempt to iterate the vector values strangely, resulting in seemingly missing values or potentially exceptions indicating EOFs. closes: #122383	2025-02-13 08:20:13 +11:00
Martijn van Groningen	d93f9c4d58	Address synthetic recovery source release test failures. (#122035 )	2025-02-07 13:02:27 -08:00
Artem Prigoda	885a5510e1	Don't return or accept `node_version` in the Desired Nodes API (#119049 ) Re-submission of #114580 > node_version was deprecated in #104209 (8.13) and shouldn't be set or returned in 9.0 Resolve ES-9443	2025-02-05 15:41:47 +01:00
Salvatore Campagna	6a526755de	Use synthetic recovery source by default if synthetic source is enabled (#119110 ) We experimented with using synthetic source for recovery and observed quite positive impact on indexing throughput by means of our nightly Rally benchmarks. As a result, here we enable it by default when synthetic source is used. To be more precise, if `index.mapping.source.mode` setting is `synthetic` we enable recovery source by means of synthetic source. Moreover, enabling synthetic source recovery is done behind a feature flag. That would allow us to enable it in snapshot builds which in turn will allow us to see performance results in Rally nightly benchmarks.	2025-02-05 13:55:51 +01:00
Benjamin Trent	2de1a3defe	Addressing int4 flat flakiness (#121437 ) This simplifies the setup and relaxes the similarity check. We can restrict the similarity check once we evolve the quantization algorithm in the future.	2025-02-05 09:34:26 +11:00
Niels Bauman	e27a50dead	Run `TransportEnrichStatsAction` on local node (#121256 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout.	2025-02-04 09:30:44 +10:00
Andrei Dan	44e5104af1	[TEST] wait for all active shards when indexing data (#121442 ) This attempts to fix a flay test where the term_freq returned by the multiple terms vectors API was `null`. I was not able to reproduce this test but this proposes a fix based on the following running theory: - an Elasticsearch cluster comprised of at least 2 nodes - we create a couple of indices with 1 primary and 1 replica - we index a document that was acknowledged only by the primary (because `wait_for_active_shards` defaults to `1`) - the test executes the multiple terms vectors API and it hits the node hosting the replica shard, which hasn't yet received the document we ingested in the primary shard. This race condition between the document replication and the test running the terms vectors API on the replica shard could yield a `null` value for the the term's `term_freq` (as the replica shard contains 0 documents). This PR proposes we change the `wait_for_active_shards` value to `all` so each write is acknowledged by all replicas before the client receives the response. Fixes #113325	2025-02-04 05:57:05 +11:00
Mayya Sharipova	f7901f0795	Support duplicate suggestions in completion field (#121324 ) Currently if a document has duplicate suggestions across different contexts, only the first gets indexed, and when a user tries to search using the second context, she will get 0 results. This PR addresses this, but adding support for duplicate suggestions across different contexts, so documents like below with duplicate inputs can be searched across all provided contexts. ```json { "my_suggest": [ { "input": [ "foox", "boo" ], "weight" : 2, "contexts": { "color": [ "red" ] } }, { "input": [ "foox" ], "weight" : 3, "contexts": { "color": [ "blue" ] } } ] } ``` Closes #82432	2025-01-31 13:58:14 -05:00
Carlos Delgado	67c2f41724	Fix serverless test - wait for index green just after first insertion (#121180 )	2025-01-31 17:45:52 +01:00
Michael Peterson	3fafb5f161	Improve resolve/cluster yaml test (#121315 ) Updated indices.resolve_cluster.json to match new resolve/cluster spec. Added new test for the no-index-expression endpoint. Adjust syntax in 10_basic_resolve_cluster.yml syntax fix so that the elasticsearch-specification validation tests pass.	2025-01-31 10:20:04 -05:00

1 2 3 4 5 ...

3785 commits