* Adds new default inference information
* Update docs/reference/mapping/types/semantic-text.asciidoc
* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc
* Update docs/reference/mapping/types/semantic-text.asciidoc
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.
(cherry picked from commit f6a1e36d6b)
* Adding new bbq index types behind a feature flag (#114439)
new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.
(cherry picked from commit 6c752abc23)
* spotless
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
(cherry picked from commit 208a1fe571)
* Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387
Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
(cherry picked from commit 8d223cbf7a)
# Conflicts:
# server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
* Remove skip test for 8.x
This was just needed for 8.x to 9.0 compatibility tests
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.
This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.
* [DOCS] Clarify copy_to behavior with strict dynamic mappings
* Add id
* De-verbosify
* Delete pesky comma
* More info about root and nest
* Fixes per review, clarify non-recursive explanation
* Skip tests for illustrative example
* Fix example syntax
* Fix typo
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.
`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.
`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.
For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.
Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.
closes: https://github.com/elastic/elasticsearch/issues/48322
PR #99445 introduced automatic normalization of dense vectors with
cosine similarity. This adds a note about this in the documentation.
Relates to #99445
This adds a new quantization mechanism for HNSW and flat indices. Here
we add `int4` quantization via the `int4_hnsw` and `int4_flat` index
types. This quantization methodology further reduces the memory required
for fast HNSW, meaning that the memory required is 8x smaller than with
regular float32 values.
8x reduction means that 1M 1024 dimension vectors goes from requiring
3.8GB to 477MB.
Recall continues to stay steady, there is some reduction that is
recoverable via slightly oversampling and reranking. For example over
500k CohereV3 vectors, only 5 extra vectors are required to be gathered
to achieve over 0.98 recall in a brute-force scenario.
