mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
Adds new bit
element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: https://github.com/elastic/elasticsearch/issues/48322
This commit is contained in:
parent
97651dfb9f
commit
5add44d7d1
38 changed files with 2713 additions and 187 deletions
|
@ -449,7 +449,10 @@ module org.elasticsearch.server {
|
|||
with
|
||||
org.elasticsearch.index.codec.vectors.ES813FlatVectorFormat,
|
||||
org.elasticsearch.index.codec.vectors.ES813Int8FlatVectorFormat,
|
||||
org.elasticsearch.index.codec.vectors.ES814HnswScalarQuantizedVectorsFormat;
|
||||
org.elasticsearch.index.codec.vectors.ES814HnswScalarQuantizedVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.ES815HnswBitVectorsFormat,
|
||||
org.elasticsearch.index.codec.vectors.ES815BitFlatVectorFormat;
|
||||
|
||||
provides org.apache.lucene.codecs.Codec with Elasticsearch814Codec;
|
||||
|
||||
provides org.apache.logging.log4j.core.util.ContextDataProvider with org.elasticsearch.common.logging.DynamicContextDataProvider;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue