Related to: https://github.com/elastic/elasticsearch/issues/102261
In test failures, we are not receiving any information around the bulk
indexing cause stacktrace, just the message.
This adds debug logging and grabs the first stacktrace over all indices.
Additionally, the logger groups by the failure message in an effort to
find unique failures over all the indices.
This commit fixes a bug in cluster stats when the dimensions of a dense_vector field type are not yet known - they are set automatically on first indexed doc. The change here is to report a value of -1 for indexed_vector_dim_min and indexed_vector_dim_max for not yet known dims.
Additionally, there is a separate issue with indexed_vector_dim_min, which would previous report 1024 if all vectors had dimensions greater than 1024.
closes#102416
* Fix incorrect dynamic mapping for non-numeric-value arrays #101965 (#101967)
After https://github.com/elastic/elasticsearch/pull/98512 we incorrectly attempt to map an array of any single value type to dense_vector.
Instead, we should validate that ALL mappers are numeric and that ALL of them are `float`.
closes: https://github.com/elastic/elasticsearch/issues/101965
* Update rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/60_dense_vector_dynamic_mapping.yml
8.10 added a new flag called `weight_matches` and we use it by default when highlighting. However, every hybrid search with kNN will fail with cryptic errors.
This PR disables weight_matches mode when kNN queries are present.
Supporting weigh_matches & kNN will take more work.
closes: https://github.com/elastic/elasticsearch/issues/101667
* Fix painless execute api and tsdb issue. (#101212)
Today using painless execute api with tsdb index can fail with a `_id must be unset or set to [cn4exTOUtxytuLkQAAABeRnR_mY] but was [_id] because [test_index] is in time_series mode` error.
This change addresses this.
The painless execute api shouldn't set use a static _id, but
let the TsidExtractingIdFieldMapper generate it.
Otherwise validation TsidExtractingIdFieldMapper fails.
Closes#101072
* update skip version
Yet another test affected by the fix for showing the synthetic source,
#98808. This can trigger an assert in older versions as the mapping they
produce (without synthetic source) doesn't match the one they may get
from the master, if the latter is in version 8.10+.
Fixes#101121
other tests rely on getting the mapping and dynamically updated it, not
just ones creating for vectors (specifically there are dynamic `float`
mappings in this yaml test suite too).
I noticed that sometimes we read from the `fieldType()` and other times
we read from the mapper directly. It seems to me that we should only
ever read and update values from one of those, for sanity's sake.
So, I removed all values that were part of the mapper directly and used
`fieldType().<value>` everywhere.
Additionally, David Turner suggested that we wait for cluster health
before verifying mappings in the yaml tests, so I added that as well.
related to: https://github.com/elastic/elasticsearch/issues/100502
* Don't print synthetic source in mapping for bwc tests
* Move comment.
* Don't print synthetic source in mapping for bwc tests #2
* Don't print synthetic source in mapping for bwc tests #2
* Revert "Don't print synthetic source in mapping for bwc tests #2"
This reverts commit 034262c5d2.
* Revert "Don't print synthetic source in mapping for bwc tests #2"
This reverts commit 44e815635e.
* Revert "Don't print synthetic source in mapping for bwc tests (#100572)"
This reverts commit 9322ab9b91.
* Exclude synthetic source test from mixedClusterTests
* Update comment.
* Mute all tsdb tests in mixedClusterTests
This is an interim step to stop sporadic test failures, while we try to
fix version skip for mixed cluster tests.
* Remove old exclusion
* Add aggregations too
* Mute tests for versions between 8.7-8.10
* Remove mute
* Restore version skipping for position fields
* Restore version skip for synthetic source
* Don't print synthetic source in mapping for bwc tests
* Move comment.
* Don't print synthetic source in mapping for bwc tests #2
* Don't print synthetic source in mapping for bwc tests #2
* Revert "Don't print synthetic source in mapping for bwc tests #2"
This reverts commit 034262c5d2.
* Revert "Don't print synthetic source in mapping for bwc tests #2"
This reverts commit 44e815635e.
* Revert "Don't print synthetic source in mapping for bwc tests (#100572)"
This reverts commit 9322ab9b91.
* Exclude synthetic source test from mixedClusterTests
* Update comment.
* Mute all tsdb tests in mixedClusterTests
This is an interim step to stop sporadic test failures, while we try to
fix version skip for mixed cluster tests.
* Remove old exclusion
* Add aggregations too
* Mute tests for versions between 8.7-8.10
* Remove mute
This update the visibility field in ESQL's REST spec to public.
It also updates the types of quotes used for one the REST object
parameter to backticks, for consistency.
This releases the Data stream lifecycle feature as a
Technical Preview feature.
Data stream lifecycle, albeit in technical preview, will allow data streams
to take advantage of a native simplified and resilient lifecycle implementation.
* Nested dense_vector support
* Adjust nested support based on new lucene version
* fixing after rebase
* fixing some code
* fixing tests adding transport version
* spotless
* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf
* Adds new max_inner_product vector similarity function (#99527)
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:
Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
* requiring top level filter to be parent filter
* adding docs & fixing tests
* adding and fixing docs
* adding changlog
* removing unnecessary file changes
* removing unused imports
* fixing test
* maybe fix doc tests
* continue tests in docs
* fixing more tests
* fixing tests
---------
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This commit adds non-null `sparse_vector` fields to `_field_names`, so that we can support `exists` queries & also introduces a new IndexVersion to ensure backwards compatibility.
Closes#99319
This commit skips settings validation during desired nodes updates.
The issue comes when a setting that needs to be validated depends
on a secure setting that cannot be read while the desired nodes are
updated. To avoid such issues, we'll skip the settings validations
completely.
Closes#96127
Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:
Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores
Adds the _inference API for managing inference models and performing inference.
Inference is a new plugin in XPack that creates a new system index (.inference)
for storing the model configurations. Models configurations are managed with
the standard PUT, GET, DELETE requests and POST to perform inference.
This PR creates an inference service for deploying and inferring on the
ELSER model.
Adds back `sparse_vector` field type, as a copy of `rank_features`.
The main goal is to have the `sparse_vector` field type available so we
can switch ELSER queries to use the new type.