mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 01:22:26 -04:00
Make bbq_hnsw the default index option for dense-vector fields with more than 384 dimensions (#129825)
This commit is contained in:
parent
1a0ab74323
commit
b855266bd1
10 changed files with 210 additions and 70 deletions
|
@ -55,7 +55,7 @@ In many cases, a brute-force kNN search is not efficient enough. For this reason
|
||||||
|
|
||||||
Unmapped array fields of float elements with size between 128 and 4096 are dynamically mapped as `dense_vector` with a default similariy of `cosine`. You can override the default similarity by explicitly mapping the field as `dense_vector` with the desired similarity.
|
Unmapped array fields of float elements with size between 128 and 4096 are dynamically mapped as `dense_vector` with a default similariy of `cosine`. You can override the default similarity by explicitly mapping the field as `dense_vector` with the desired similarity.
|
||||||
|
|
||||||
Indexing is enabled by default for dense vector fields and indexed as `int8_hnsw`. When indexing is enabled, you can define the vector similarity to use in kNN search:
|
Indexing is enabled by default for dense vector fields and indexed as `bbq_hnsw` if dimensions are greater than or equal to 384, otherwise they are indexed as `int8_hnsw`. When indexing is enabled, you can define the vector similarity to use in kNN search:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
PUT my-index-2
|
PUT my-index-2
|
||||||
|
@ -105,7 +105,7 @@ The `dense_vector` type supports quantization to reduce the memory footprint req
|
||||||
|
|
||||||
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) for more information.
|
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) for more information.
|
||||||
|
|
||||||
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default index type is `int8_hnsw`.
|
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default index type is `bbq_hnsw` for vectors with greater than or equal to 384 dimensions, otherwise it's `int8_hnsw`.
|
||||||
|
|
||||||
Quantized vectors can use [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) to improve accuracy on approximate kNN search results.
|
Quantized vectors can use [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) to improve accuracy on approximate kNN search results.
|
||||||
|
|
||||||
|
@ -255,9 +255,9 @@ $$$dense-vector-index-options$$$
|
||||||
`type`
|
`type`
|
||||||
: (Required, string) The type of kNN algorithm to use. Can be either any of:
|
: (Required, string) The type of kNN algorithm to use. Can be either any of:
|
||||||
* `hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) for scalable approximate kNN search. This supports all `element_type` values.
|
* `hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) for scalable approximate kNN search. This supports all `element_type` values.
|
||||||
* `int8_hnsw` - The default index type for float vectors. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 4x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
* `int8_hnsw` - The default index type for float vectors with less than 384 dimensions. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 4x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
||||||
* `int4_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 8x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
* `int4_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 8x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
||||||
* `bbq_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically binary quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 32x at the cost of accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
* `bbq_hnsw` - The default index type for float vectors with greater than or equal to 384 dimensions. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically binary quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 32x at the cost of accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
|
||||||
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
|
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
|
||||||
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports `element_type` of `float`.
|
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports `element_type` of `float`.
|
||||||
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports `element_type` of `float`.
|
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports `element_type` of `float`.
|
||||||
|
|
|
@ -63,6 +63,7 @@ tasks.named("yamlRestCompatTestTransform").configure ({ task ->
|
||||||
task.skipTest("cat.shards/10_basic/Help", "sync_id is removed in 9.0")
|
task.skipTest("cat.shards/10_basic/Help", "sync_id is removed in 9.0")
|
||||||
task.skipTest("search/500_date_range/from, to, include_lower, include_upper deprecated", "deprecated parameters are removed in 9.0")
|
task.skipTest("search/500_date_range/from, to, include_lower, include_upper deprecated", "deprecated parameters are removed in 9.0")
|
||||||
task.skipTest("search.highlight/30_max_analyzed_offset/Plain highlighter with max_analyzed_offset < 0 should FAIL", "semantics of test has changed")
|
task.skipTest("search.highlight/30_max_analyzed_offset/Plain highlighter with max_analyzed_offset < 0 should FAIL", "semantics of test has changed")
|
||||||
|
task.skipTest("search.vectors/70_dense_vector_telemetry/Field mapping stats with field details", "default dense vector field mapping has changed")
|
||||||
task.skipTest("range/20_synthetic_source/Double range", "_source.mode mapping attribute is no-op since 9.0.0")
|
task.skipTest("range/20_synthetic_source/Double range", "_source.mode mapping attribute is no-op since 9.0.0")
|
||||||
task.skipTest("range/20_synthetic_source/Float range", "_source.mode mapping attribute is no-op since 9.0.0")
|
task.skipTest("range/20_synthetic_source/Float range", "_source.mode mapping attribute is no-op since 9.0.0")
|
||||||
task.skipTest("range/20_synthetic_source/Integer range", "_source.mode mapping attribute is no-op since 9.0.0")
|
task.skipTest("range/20_synthetic_source/Integer range", "_source.mode mapping attribute is no-op since 9.0.0")
|
||||||
|
|
|
@ -1,7 +1,8 @@
|
||||||
setup:
|
setup:
|
||||||
- requires:
|
- requires:
|
||||||
cluster_features: [ "gte_v8.4.0" ]
|
cluster_features: [ "search.vectors.mappers.default_bbq_hnsw" ]
|
||||||
reason: "Cluster mappings stats for indexed dense vector was added in 8.4"
|
reason: "Test cluster feature 'search.vectors.mappers.default_bbq_hnsw' is required for using bbq as default
|
||||||
|
indexing for vector fields."
|
||||||
- skip:
|
- skip:
|
||||||
features: headers
|
features: headers
|
||||||
|
|
||||||
|
@ -13,7 +14,7 @@ setup:
|
||||||
index.number_of_shards: 2
|
index.number_of_shards: 2
|
||||||
mappings:
|
mappings:
|
||||||
properties:
|
properties:
|
||||||
vector1:
|
vector_hnsw_explicit:
|
||||||
type: dense_vector
|
type: dense_vector
|
||||||
dims: 768
|
dims: 768
|
||||||
index: true
|
index: true
|
||||||
|
@ -23,12 +24,16 @@ setup:
|
||||||
type: hnsw
|
type: hnsw
|
||||||
m: 16
|
m: 16
|
||||||
ef_construction: 100
|
ef_construction: 100
|
||||||
vector2:
|
vector_bbq_default:
|
||||||
type: dense_vector
|
type: dense_vector
|
||||||
dims: 1024
|
dims: 1024
|
||||||
index: true
|
index: true
|
||||||
similarity: dot_product
|
similarity: dot_product
|
||||||
vector3:
|
vector_int8_hnsw_default:
|
||||||
|
type: dense_vector
|
||||||
|
dims: 100
|
||||||
|
index: true
|
||||||
|
vector_no_index:
|
||||||
type: dense_vector
|
type: dense_vector
|
||||||
dims: 100
|
dims: 100
|
||||||
index: false
|
index: false
|
||||||
|
@ -52,10 +57,10 @@ setup:
|
||||||
- do: { cluster.stats: { } }
|
- do: { cluster.stats: { } }
|
||||||
- length: { indices.mappings.field_types: 1 }
|
- length: { indices.mappings.field_types: 1 }
|
||||||
- match: { indices.mappings.field_types.0.name: dense_vector }
|
- match: { indices.mappings.field_types.0.name: dense_vector }
|
||||||
- match: { indices.mappings.field_types.0.count: 4 }
|
- match: { indices.mappings.field_types.0.count: 5 }
|
||||||
- match: { indices.mappings.field_types.0.index_count: 2 }
|
- match: { indices.mappings.field_types.0.index_count: 2 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_count: 3 }
|
- match: { indices.mappings.field_types.0.indexed_vector_count: 4 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_dim_min: 768 }
|
- match: { indices.mappings.field_types.0.indexed_vector_dim_min: 100 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_dim_max: 1024 }
|
- match: { indices.mappings.field_types.0.indexed_vector_dim_max: 1024 }
|
||||||
---
|
---
|
||||||
"Field mapping stats with field details":
|
"Field mapping stats with field details":
|
||||||
|
@ -70,15 +75,16 @@ setup:
|
||||||
- do: { cluster.stats: { } }
|
- do: { cluster.stats: { } }
|
||||||
- length: { indices.mappings.field_types: 1 }
|
- length: { indices.mappings.field_types: 1 }
|
||||||
- match: { indices.mappings.field_types.0.name: dense_vector }
|
- match: { indices.mappings.field_types.0.name: dense_vector }
|
||||||
- match: { indices.mappings.field_types.0.count: 4 }
|
- match: { indices.mappings.field_types.0.count: 5 }
|
||||||
- match: { indices.mappings.field_types.0.index_count: 2 }
|
- match: { indices.mappings.field_types.0.index_count: 2 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_count: 3 }
|
- match: { indices.mappings.field_types.0.indexed_vector_count: 4 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_dim_min: 768 }
|
- match: { indices.mappings.field_types.0.indexed_vector_dim_min: 100 }
|
||||||
- match: { indices.mappings.field_types.0.indexed_vector_dim_max: 1024 }
|
- match: { indices.mappings.field_types.0.indexed_vector_dim_max: 1024 }
|
||||||
- match: { indices.mappings.field_types.0.vector_index_type_count.hnsw: 1 }
|
- match: { indices.mappings.field_types.0.vector_index_type_count.hnsw: 1 }
|
||||||
- match: { indices.mappings.field_types.0.vector_index_type_count.int8_hnsw: 2 }
|
- match: { indices.mappings.field_types.0.vector_index_type_count.int8_hnsw: 1 }
|
||||||
|
- match: { indices.mappings.field_types.0.vector_index_type_count.bbq_hnsw: 2 }
|
||||||
- match: { indices.mappings.field_types.0.vector_index_type_count.not_indexed: 1 }
|
- match: { indices.mappings.field_types.0.vector_index_type_count.not_indexed: 1 }
|
||||||
- match: { indices.mappings.field_types.0.vector_similarity_type_count.l2_norm: 2 }
|
- match: { indices.mappings.field_types.0.vector_similarity_type_count.l2_norm: 2 }
|
||||||
- match: { indices.mappings.field_types.0.vector_similarity_type_count.dot_product: 1 }
|
- match: { indices.mappings.field_types.0.vector_similarity_type_count.dot_product: 1 }
|
||||||
- match: { indices.mappings.field_types.0.vector_element_type_count.float: 3 }
|
- match: { indices.mappings.field_types.0.vector_element_type_count.float: 4 }
|
||||||
- match: { indices.mappings.field_types.0.vector_element_type_count.byte: 1 }
|
- match: { indices.mappings.field_types.0.vector_element_type_count.byte: 1 }
|
||||||
|
|
|
@ -53,6 +53,7 @@ import java.util.function.Consumer;
|
||||||
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_IGNORE_DYNAMIC_BEYOND_LIMIT_SETTING;
|
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_IGNORE_DYNAMIC_BEYOND_LIMIT_SETTING;
|
||||||
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_NESTED_FIELDS_LIMIT_SETTING;
|
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_NESTED_FIELDS_LIMIT_SETTING;
|
||||||
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING;
|
import static org.elasticsearch.index.mapper.MapperService.INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING;
|
||||||
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.BBQ_DIMS_DEFAULT_THRESHOLD;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING;
|
||||||
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
|
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
|
||||||
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;
|
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;
|
||||||
|
@ -908,6 +909,59 @@ public class DynamicMappingIT extends ESIntegTestCase {
|
||||||
client().index(
|
client().index(
|
||||||
new IndexRequest("test").source("obj.vector", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING, 0.0, 5.0).toArray())
|
new IndexRequest("test").source("obj.vector", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING, 0.0, 5.0).toArray())
|
||||||
).get();
|
).get();
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testDenseVectorDynamicMapping() throws Exception {
|
||||||
|
assertAcked(indicesAdmin().prepareCreate("test").setMapping("""
|
||||||
|
{
|
||||||
|
"dynamic": "true"
|
||||||
|
}
|
||||||
|
""").get());
|
||||||
|
|
||||||
|
client().index(
|
||||||
|
new IndexRequest("test").source("vector_int8", Randomness.get().doubles(BBQ_DIMS_DEFAULT_THRESHOLD - 1, 0.0, 5.0).toArray())
|
||||||
|
).get();
|
||||||
|
client().index(
|
||||||
|
new IndexRequest("test").source("vector_bbq", Randomness.get().doubles(BBQ_DIMS_DEFAULT_THRESHOLD, 0.0, 5.0).toArray())
|
||||||
|
).get();
|
||||||
|
Map<String, Object> mappings = indicesAdmin().prepareGetMappings(TEST_REQUEST_TIMEOUT, "test")
|
||||||
|
.get()
|
||||||
|
.mappings()
|
||||||
|
.get("test")
|
||||||
|
.sourceAsMap();
|
||||||
|
assertTrue(new WriteField("properties.vector_int8", () -> mappings).exists());
|
||||||
|
assertTrue(new WriteField("properties.vector_int8.index_options.type", () -> mappings).get(null).toString().equals("int8_hnsw"));
|
||||||
|
assertTrue(new WriteField("properties.vector_bbq", () -> mappings).exists());
|
||||||
|
assertTrue(new WriteField("properties.vector_bbq.index_options.type", () -> mappings).get(null).toString().equals("bbq_hnsw"));
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testBBQDynamicMappingWhenFirstIngestingDoc() throws Exception {
|
||||||
|
assertAcked(indicesAdmin().prepareCreate("test").setMapping("""
|
||||||
|
{
|
||||||
|
"properties": {
|
||||||
|
"vector": {
|
||||||
|
"type": "dense_vector"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
""").get());
|
||||||
|
|
||||||
|
Map<String, Object> mappings = indicesAdmin().prepareGetMappings(TEST_REQUEST_TIMEOUT, "test")
|
||||||
|
.get()
|
||||||
|
.mappings()
|
||||||
|
.get("test")
|
||||||
|
.sourceAsMap();
|
||||||
|
assertTrue(new WriteField("properties.vector", () -> mappings).exists());
|
||||||
|
assertFalse(new WriteField("properties.vector.index_options.type", () -> mappings).exists());
|
||||||
|
|
||||||
|
client().index(new IndexRequest("test").source("vector", Randomness.get().doubles(BBQ_DIMS_DEFAULT_THRESHOLD, 0.0, 5.0).toArray()))
|
||||||
|
.get();
|
||||||
|
Map<String, Object> updatedMappings = indicesAdmin().prepareGetMappings(TEST_REQUEST_TIMEOUT, "test")
|
||||||
|
.get()
|
||||||
|
.mappings()
|
||||||
|
.get("test")
|
||||||
|
.sourceAsMap();
|
||||||
|
assertTrue(new WriteField("properties.vector", () -> updatedMappings).exists());
|
||||||
|
assertTrue(new WriteField("properties.vector.index_options.type", () -> updatedMappings).get(null).toString().equals("bbq_hnsw"));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -177,6 +177,7 @@ public class IndexVersions {
|
||||||
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED = def(9_029_0_00, Version.LUCENE_10_2_1);
|
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED = def(9_029_0_00, Version.LUCENE_10_2_1);
|
||||||
public static final IndexVersion UPGRADE_TO_LUCENE_10_2_2 = def(9_030_0_00, Version.LUCENE_10_2_2);
|
public static final IndexVersion UPGRADE_TO_LUCENE_10_2_2 = def(9_030_0_00, Version.LUCENE_10_2_2);
|
||||||
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_031_0_00, Version.LUCENE_10_2_2);
|
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_031_0_00, Version.LUCENE_10_2_2);
|
||||||
|
public static final IndexVersion DEFAULT_DENSE_VECTOR_TO_BBQ_HNSW = def(9_032_0_00, Version.LUCENE_10_2_2);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* STOP! READ THIS FIRST! No, really,
|
* STOP! READ THIS FIRST! No, really,
|
||||||
|
|
|
@ -806,6 +806,7 @@ public final class DocumentParser {
|
||||||
fieldName,
|
fieldName,
|
||||||
context.indexSettings().getIndexVersionCreated()
|
context.indexSettings().getIndexVersionCreated()
|
||||||
);
|
);
|
||||||
|
builder.dimensions(mappers.size());
|
||||||
DenseVectorFieldMapper denseVectorFieldMapper = builder.build(builderContext);
|
DenseVectorFieldMapper denseVectorFieldMapper = builder.build(builderContext);
|
||||||
context.updateDynamicMappers(fullFieldName, List.of(denseVectorFieldMapper));
|
context.updateDynamicMappers(fullFieldName, List.of(denseVectorFieldMapper));
|
||||||
}
|
}
|
||||||
|
|
|
@ -193,6 +193,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
public static final IndexVersion INDEXED_BY_DEFAULT_INDEX_VERSION = IndexVersions.FIRST_DETACHED_INDEX_VERSION;
|
public static final IndexVersion INDEXED_BY_DEFAULT_INDEX_VERSION = IndexVersions.FIRST_DETACHED_INDEX_VERSION;
|
||||||
public static final IndexVersion NORMALIZE_COSINE = IndexVersions.NORMALIZED_VECTOR_COSINE;
|
public static final IndexVersion NORMALIZE_COSINE = IndexVersions.NORMALIZED_VECTOR_COSINE;
|
||||||
public static final IndexVersion DEFAULT_TO_INT8 = IndexVersions.DEFAULT_DENSE_VECTOR_TO_INT8_HNSW;
|
public static final IndexVersion DEFAULT_TO_INT8 = IndexVersions.DEFAULT_DENSE_VECTOR_TO_INT8_HNSW;
|
||||||
|
public static final IndexVersion DEFAULT_TO_BBQ = IndexVersions.DEFAULT_DENSE_VECTOR_TO_BBQ_HNSW;
|
||||||
public static final IndexVersion LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION = IndexVersions.V_8_9_0;
|
public static final IndexVersion LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION = IndexVersions.V_8_9_0;
|
||||||
|
|
||||||
public static final NodeFeature RESCORE_VECTOR_QUANTIZED_VECTOR_MAPPING = new NodeFeature("mapper.dense_vector.rescore_vector");
|
public static final NodeFeature RESCORE_VECTOR_QUANTIZED_VECTOR_MAPPING = new NodeFeature("mapper.dense_vector.rescore_vector");
|
||||||
|
@ -212,6 +213,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
public static final int MAGNITUDE_BYTES = 4;
|
public static final int MAGNITUDE_BYTES = 4;
|
||||||
public static final int OVERSAMPLE_LIMIT = 10_000; // Max oversample allowed
|
public static final int OVERSAMPLE_LIMIT = 10_000; // Max oversample allowed
|
||||||
public static final float DEFAULT_OVERSAMPLE = 3.0F; // Default oversample value
|
public static final float DEFAULT_OVERSAMPLE = 3.0F; // Default oversample value
|
||||||
|
public static final int BBQ_DIMS_DEFAULT_THRESHOLD = 384; // Lower bound for dimensions for using bbq_hnsw as default index options
|
||||||
|
|
||||||
private static DenseVectorFieldMapper toType(FieldMapper in) {
|
private static DenseVectorFieldMapper toType(FieldMapper in) {
|
||||||
return (DenseVectorFieldMapper) in;
|
return (DenseVectorFieldMapper) in;
|
||||||
|
@ -226,34 +228,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
}
|
}
|
||||||
return elementType;
|
return elementType;
|
||||||
}, m -> toType(m).fieldType().elementType, XContentBuilder::field, Objects::toString);
|
}, m -> toType(m).fieldType().elementType, XContentBuilder::field, Objects::toString);
|
||||||
|
private final Parameter<Integer> dims;
|
||||||
// This is defined as updatable because it can be updated once, from [null] to a valid dim size,
|
|
||||||
// by a dynamic mapping update. Once it has been set, however, the value cannot be changed.
|
|
||||||
private final Parameter<Integer> dims = new Parameter<>("dims", true, () -> null, (n, c, o) -> {
|
|
||||||
if (o instanceof Integer == false) {
|
|
||||||
throw new MapperParsingException("Property [dims] on field [" + n + "] must be an integer but got [" + o + "]");
|
|
||||||
}
|
|
||||||
|
|
||||||
return XContentMapValues.nodeIntegerValue(o);
|
|
||||||
}, m -> toType(m).fieldType().dims, XContentBuilder::field, Objects::toString).setSerializerCheck((id, ic, v) -> v != null)
|
|
||||||
.setMergeValidator((previous, current, c) -> previous == null || Objects.equals(previous, current))
|
|
||||||
.addValidator(dims -> {
|
|
||||||
if (dims == null) {
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
int maxDims = elementType.getValue() == ElementType.BIT ? MAX_DIMS_COUNT_BIT : MAX_DIMS_COUNT;
|
|
||||||
int minDims = elementType.getValue() == ElementType.BIT ? Byte.SIZE : 1;
|
|
||||||
if (dims < minDims || dims > maxDims) {
|
|
||||||
throw new MapperParsingException(
|
|
||||||
"The number of dimensions should be in the range [" + minDims + ", " + maxDims + "] but was [" + dims + "]"
|
|
||||||
);
|
|
||||||
}
|
|
||||||
if (elementType.getValue() == ElementType.BIT) {
|
|
||||||
if (dims % Byte.SIZE != 0) {
|
|
||||||
throw new MapperParsingException("The number of dimensions for should be a multiple of 8 but was [" + dims + "]");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
});
|
|
||||||
private final Parameter<VectorSimilarity> similarity;
|
private final Parameter<VectorSimilarity> similarity;
|
||||||
|
|
||||||
private final Parameter<DenseVectorIndexOptions> indexOptions;
|
private final Parameter<DenseVectorIndexOptions> indexOptions;
|
||||||
|
@ -266,8 +241,38 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
public Builder(String name, IndexVersion indexVersionCreated) {
|
public Builder(String name, IndexVersion indexVersionCreated) {
|
||||||
super(name);
|
super(name);
|
||||||
this.indexVersionCreated = indexVersionCreated;
|
this.indexVersionCreated = indexVersionCreated;
|
||||||
|
// This is defined as updatable because it can be updated once, from [null] to a valid dim size,
|
||||||
|
// by a dynamic mapping update. Once it has been set, however, the value cannot be changed.
|
||||||
|
this.dims = new Parameter<>("dims", true, () -> null, (n, c, o) -> {
|
||||||
|
if (o instanceof Integer == false) {
|
||||||
|
throw new MapperParsingException("Property [dims] on field [" + n + "] must be an integer but got [" + o + "]");
|
||||||
|
}
|
||||||
|
|
||||||
|
return XContentMapValues.nodeIntegerValue(o);
|
||||||
|
}, m -> toType(m).fieldType().dims, XContentBuilder::field, Objects::toString).setSerializerCheck((id, ic, v) -> v != null)
|
||||||
|
.setMergeValidator((previous, current, c) -> previous == null || Objects.equals(previous, current))
|
||||||
|
.addValidator(dims -> {
|
||||||
|
if (dims == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
int maxDims = elementType.getValue() == ElementType.BIT ? MAX_DIMS_COUNT_BIT : MAX_DIMS_COUNT;
|
||||||
|
int minDims = elementType.getValue() == ElementType.BIT ? Byte.SIZE : 1;
|
||||||
|
if (dims < minDims || dims > maxDims) {
|
||||||
|
throw new MapperParsingException(
|
||||||
|
"The number of dimensions should be in the range [" + minDims + ", " + maxDims + "] but was [" + dims + "]"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (elementType.getValue() == ElementType.BIT) {
|
||||||
|
if (dims % Byte.SIZE != 0) {
|
||||||
|
throw new MapperParsingException(
|
||||||
|
"The number of dimensions for should be a multiple of 8 but was [" + dims + "]"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
final boolean indexedByDefault = indexVersionCreated.onOrAfter(INDEXED_BY_DEFAULT_INDEX_VERSION);
|
final boolean indexedByDefault = indexVersionCreated.onOrAfter(INDEXED_BY_DEFAULT_INDEX_VERSION);
|
||||||
final boolean defaultInt8Hnsw = indexVersionCreated.onOrAfter(IndexVersions.DEFAULT_DENSE_VECTOR_TO_INT8_HNSW);
|
final boolean defaultInt8Hnsw = indexVersionCreated.onOrAfter(IndexVersions.DEFAULT_DENSE_VECTOR_TO_INT8_HNSW);
|
||||||
|
final boolean defaultBBQ8Hnsw = indexVersionCreated.onOrAfter(IndexVersions.DEFAULT_DENSE_VECTOR_TO_BBQ_HNSW);
|
||||||
this.indexed = Parameter.indexParam(m -> toType(m).fieldType().indexed, indexedByDefault);
|
this.indexed = Parameter.indexParam(m -> toType(m).fieldType().indexed, indexedByDefault);
|
||||||
if (indexedByDefault) {
|
if (indexedByDefault) {
|
||||||
// Only serialize on newer index versions to prevent breaking existing indices when upgrading
|
// Only serialize on newer index versions to prevent breaking existing indices when upgrading
|
||||||
|
@ -297,14 +302,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
this.indexOptions = new Parameter<>(
|
this.indexOptions = new Parameter<>(
|
||||||
"index_options",
|
"index_options",
|
||||||
true,
|
true,
|
||||||
() -> defaultInt8Hnsw && elementType.getValue() == ElementType.FLOAT && this.indexed.getValue()
|
() -> defaultIndexOptions(defaultInt8Hnsw, defaultBBQ8Hnsw),
|
||||||
? new Int8HnswIndexOptions(
|
|
||||||
Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN,
|
|
||||||
Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH,
|
|
||||||
null,
|
|
||||||
null
|
|
||||||
)
|
|
||||||
: null,
|
|
||||||
(n, c, o) -> o == null ? null : parseIndexOptions(n, o, indexVersionCreated),
|
(n, c, o) -> o == null ? null : parseIndexOptions(n, o, indexVersionCreated),
|
||||||
m -> toType(m).indexOptions,
|
m -> toType(m).indexOptions,
|
||||||
(b, n, v) -> {
|
(b, n, v) -> {
|
||||||
|
@ -328,7 +326,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
|| Objects.equals(previous, current)
|
|| Objects.equals(previous, current)
|
||||||
|| previous.updatableTo(current)
|
|| previous.updatableTo(current)
|
||||||
);
|
);
|
||||||
if (defaultInt8Hnsw) {
|
if (defaultInt8Hnsw || defaultBBQ8Hnsw) {
|
||||||
this.indexOptions.alwaysSerialize();
|
this.indexOptions.alwaysSerialize();
|
||||||
}
|
}
|
||||||
this.indexed.addValidator(v -> {
|
this.indexed.addValidator(v -> {
|
||||||
|
@ -351,6 +349,26 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private DenseVectorIndexOptions defaultIndexOptions(boolean defaultInt8Hnsw, boolean defaultBBQHnsw) {
|
||||||
|
if (this.dims != null && this.dims.isConfigured() && elementType.getValue() == ElementType.FLOAT && this.indexed.getValue()) {
|
||||||
|
if (defaultBBQHnsw && this.dims.getValue() >= BBQ_DIMS_DEFAULT_THRESHOLD) {
|
||||||
|
return new BBQHnswIndexOptions(
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN,
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH,
|
||||||
|
new RescoreVector(DEFAULT_OVERSAMPLE)
|
||||||
|
);
|
||||||
|
} else if (defaultInt8Hnsw) {
|
||||||
|
return new Int8HnswIndexOptions(
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN,
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH,
|
||||||
|
null,
|
||||||
|
null
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected Parameter<?>[] getParameters() {
|
protected Parameter<?>[] getParameters() {
|
||||||
return new Parameter<?>[] { elementType, dims, indexed, similarity, indexOptions, meta };
|
return new Parameter<?>[] { elementType, dims, indexed, similarity, indexOptions, meta };
|
||||||
|
@ -2695,8 +2713,28 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
}
|
}
|
||||||
if (fieldType().dims == null) {
|
if (fieldType().dims == null) {
|
||||||
int dims = fieldType().elementType.parseDimensionCount(context);
|
int dims = fieldType().elementType.parseDimensionCount(context);
|
||||||
if (fieldType().indexOptions != null) {
|
;
|
||||||
fieldType().indexOptions.validateDimension(dims);
|
final boolean defaultInt8Hnsw = indexCreatedVersion.onOrAfter(IndexVersions.DEFAULT_DENSE_VECTOR_TO_INT8_HNSW);
|
||||||
|
final boolean defaultBBQ8Hnsw = indexCreatedVersion.onOrAfter(IndexVersions.DEFAULT_DENSE_VECTOR_TO_BBQ_HNSW);
|
||||||
|
DenseVectorIndexOptions denseVectorIndexOptions = fieldType().indexOptions;
|
||||||
|
if (denseVectorIndexOptions == null && fieldType().getElementType() == ElementType.FLOAT && fieldType().isIndexed()) {
|
||||||
|
if (defaultBBQ8Hnsw && dims >= BBQ_DIMS_DEFAULT_THRESHOLD) {
|
||||||
|
denseVectorIndexOptions = new BBQHnswIndexOptions(
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN,
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH,
|
||||||
|
new RescoreVector(DEFAULT_OVERSAMPLE)
|
||||||
|
);
|
||||||
|
} else if (defaultInt8Hnsw) {
|
||||||
|
denseVectorIndexOptions = new Int8HnswIndexOptions(
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN,
|
||||||
|
Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH,
|
||||||
|
null,
|
||||||
|
null
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (denseVectorIndexOptions != null) {
|
||||||
|
denseVectorIndexOptions.validateDimension(dims);
|
||||||
}
|
}
|
||||||
DenseVectorFieldType updatedDenseVectorFieldType = new DenseVectorFieldType(
|
DenseVectorFieldType updatedDenseVectorFieldType = new DenseVectorFieldType(
|
||||||
fieldType().name(),
|
fieldType().name(),
|
||||||
|
@ -2705,7 +2743,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
dims,
|
dims,
|
||||||
fieldType().indexed,
|
fieldType().indexed,
|
||||||
fieldType().similarity,
|
fieldType().similarity,
|
||||||
fieldType().indexOptions,
|
denseVectorIndexOptions,
|
||||||
fieldType().meta(),
|
fieldType().meta(),
|
||||||
fieldType().isSyntheticSource
|
fieldType().isSyntheticSource
|
||||||
);
|
);
|
||||||
|
@ -2713,7 +2751,7 @@ public class DenseVectorFieldMapper extends FieldMapper {
|
||||||
leafName(),
|
leafName(),
|
||||||
updatedDenseVectorFieldType,
|
updatedDenseVectorFieldType,
|
||||||
builderParams,
|
builderParams,
|
||||||
indexOptions,
|
denseVectorIndexOptions,
|
||||||
indexCreatedVersion
|
indexCreatedVersion
|
||||||
);
|
);
|
||||||
context.addDynamicMapper(update);
|
context.addDynamicMapper(update);
|
||||||
|
|
|
@ -31,6 +31,7 @@ public final class SearchFeatures implements FeatureSpecification {
|
||||||
public static final NodeFeature RESCORER_MISSING_FIELD_BAD_REQUEST = new NodeFeature("search.rescorer.missing.field.bad.request");
|
public static final NodeFeature RESCORER_MISSING_FIELD_BAD_REQUEST = new NodeFeature("search.rescorer.missing.field.bad.request");
|
||||||
public static final NodeFeature INT_SORT_FOR_INT_SHORT_BYTE_FIELDS = new NodeFeature("search.sort.int_sort_for_int_short_byte_fields");
|
public static final NodeFeature INT_SORT_FOR_INT_SHORT_BYTE_FIELDS = new NodeFeature("search.sort.int_sort_for_int_short_byte_fields");
|
||||||
static final NodeFeature MULTI_MATCH_CHECKS_POSITIONS = new NodeFeature("search.multi.match.checks.positions");
|
static final NodeFeature MULTI_MATCH_CHECKS_POSITIONS = new NodeFeature("search.multi.match.checks.positions");
|
||||||
|
public static final NodeFeature BBQ_HNSW_DEFAULT_INDEXING = new NodeFeature("search.vectors.mappers.default_bbq_hnsw");
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Set<NodeFeature> getTestFeatures() {
|
public Set<NodeFeature> getTestFeatures() {
|
||||||
|
@ -39,7 +40,8 @@ public final class SearchFeatures implements FeatureSpecification {
|
||||||
COMPLETION_FIELD_SUPPORTS_DUPLICATE_SUGGESTIONS,
|
COMPLETION_FIELD_SUPPORTS_DUPLICATE_SUGGESTIONS,
|
||||||
RESCORER_MISSING_FIELD_BAD_REQUEST,
|
RESCORER_MISSING_FIELD_BAD_REQUEST,
|
||||||
INT_SORT_FOR_INT_SHORT_BYTE_FIELDS,
|
INT_SORT_FOR_INT_SHORT_BYTE_FIELDS,
|
||||||
MULTI_MATCH_CHECKS_POSITIONS
|
MULTI_MATCH_CHECKS_POSITIONS,
|
||||||
|
BBQ_HNSW_DEFAULT_INDEXING
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -24,6 +24,7 @@ import java.io.IOException;
|
||||||
import java.time.Instant;
|
import java.time.Instant;
|
||||||
import java.util.stream.Stream;
|
import java.util.stream.Stream;
|
||||||
|
|
||||||
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.BBQ_DIMS_DEFAULT_THRESHOLD;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MAX_DIMS_COUNT;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MAX_DIMS_COUNT;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING;
|
||||||
import static org.hamcrest.CoreMatchers.containsString;
|
import static org.hamcrest.CoreMatchers.containsString;
|
||||||
|
@ -980,7 +981,8 @@ public class DynamicMappingTests extends MapperServiceTestCase {
|
||||||
builder.startObject()
|
builder.startObject()
|
||||||
.field("mapsToFloatTooSmall", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING - 1, 0.0, 5.0).toArray())
|
.field("mapsToFloatTooSmall", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING - 1, 0.0, 5.0).toArray())
|
||||||
.field("mapsToFloatTooBig", Randomness.get().doubles(MAX_DIMS_COUNT + 1, 0.0, 5.0).toArray())
|
.field("mapsToFloatTooBig", Randomness.get().doubles(MAX_DIMS_COUNT + 1, 0.0, 5.0).toArray())
|
||||||
.field("mapsToDenseVector", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING, 0.0, 5.0).toArray())
|
.field("mapsToInt8HnswDenseVector", Randomness.get().doubles(MIN_DIMS_FOR_DYNAMIC_FLOAT_MAPPING, 0.0, 5.0).toArray())
|
||||||
|
.field("mapsToBBQHnswDenseVector", Randomness.get().doubles(BBQ_DIMS_DEFAULT_THRESHOLD, 0.0, 5.0).toArray())
|
||||||
.endObject()
|
.endObject()
|
||||||
);
|
);
|
||||||
ParsedDocument parsedDocument = mapper.parse(new SourceToParse("id", source, builder.contentType()));
|
ParsedDocument parsedDocument = mapper.parse(new SourceToParse("id", source, builder.contentType()));
|
||||||
|
@ -988,8 +990,18 @@ public class DynamicMappingTests extends MapperServiceTestCase {
|
||||||
assertNotNull(update);
|
assertNotNull(update);
|
||||||
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToFloatTooSmall")).fieldType().typeName(), equalTo("float"));
|
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToFloatTooSmall")).fieldType().typeName(), equalTo("float"));
|
||||||
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToFloatTooBig")).fieldType().typeName(), equalTo("float"));
|
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToFloatTooBig")).fieldType().typeName(), equalTo("float"));
|
||||||
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToDenseVector")).fieldType().typeName(), equalTo("dense_vector"));
|
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToInt8HnswDenseVector")).fieldType().typeName(), equalTo("dense_vector"));
|
||||||
DenseVectorFieldMapper dvFieldMapper = ((DenseVectorFieldMapper) update.getRoot().getMapper("mapsToDenseVector"));
|
DenseVectorFieldMapper int8DVFieldMapper = ((DenseVectorFieldMapper) update.getRoot().getMapper("mapsToInt8HnswDenseVector"));
|
||||||
|
assertThat(
|
||||||
|
((DenseVectorFieldMapper.DenseVectorIndexOptions) int8DVFieldMapper.fieldType().getIndexOptions()).getType().getName(),
|
||||||
|
equalTo("int8_hnsw")
|
||||||
|
);
|
||||||
|
assertThat(((FieldMapper) update.getRoot().getMapper("mapsToBBQHnswDenseVector")).fieldType().typeName(), equalTo("dense_vector"));
|
||||||
|
DenseVectorFieldMapper bbqDVFieldMapper = ((DenseVectorFieldMapper) update.getRoot().getMapper("mapsToBBQHnswDenseVector"));
|
||||||
|
assertThat(
|
||||||
|
((DenseVectorFieldMapper.DenseVectorIndexOptions) bbqDVFieldMapper.fieldType().getIndexOptions()).getType().getName(),
|
||||||
|
equalTo("bbq_hnsw")
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
public void testDefaultDenseVectorMappingsObject() throws IOException {
|
public void testDefaultDenseVectorMappingsObject() throws IOException {
|
||||||
|
|
|
@ -66,6 +66,7 @@ import java.util.Set;
|
||||||
import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH;
|
import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_BEAM_WIDTH;
|
||||||
import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN;
|
import static org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN;
|
||||||
import static org.elasticsearch.index.codec.vectors.IVFVectorsFormat.DYNAMIC_NPROBE;
|
import static org.elasticsearch.index.codec.vectors.IVFVectorsFormat.DYNAMIC_NPROBE;
|
||||||
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.DEFAULT_OVERSAMPLE;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.IVF_FORMAT;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.IVF_FORMAT;
|
||||||
import static org.hamcrest.Matchers.containsString;
|
import static org.hamcrest.Matchers.containsString;
|
||||||
import static org.hamcrest.Matchers.equalTo;
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
|
@ -85,7 +86,9 @@ public class DenseVectorFieldMapperTests extends MapperTestCase {
|
||||||
this.elementType = randomFrom(ElementType.BYTE, ElementType.FLOAT, ElementType.BIT);
|
this.elementType = randomFrom(ElementType.BYTE, ElementType.FLOAT, ElementType.BIT);
|
||||||
this.indexed = randomBoolean();
|
this.indexed = randomBoolean();
|
||||||
this.indexOptionsSet = this.indexed && randomBoolean();
|
this.indexOptionsSet = this.indexed && randomBoolean();
|
||||||
this.dims = ElementType.BIT == elementType ? 4 * Byte.SIZE : 4;
|
int baseDims = ElementType.BIT == elementType ? 4 * Byte.SIZE : 4;
|
||||||
|
int randomMultiplier = ElementType.FLOAT == elementType ? randomIntBetween(1, 64) : 1;
|
||||||
|
this.dims = baseDims * randomMultiplier;
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -107,15 +110,28 @@ public class DenseVectorFieldMapperTests extends MapperTestCase {
|
||||||
// Serialize if it's new index version, or it was not the default for previous indices
|
// Serialize if it's new index version, or it was not the default for previous indices
|
||||||
b.field("index", indexed);
|
b.field("index", indexed);
|
||||||
}
|
}
|
||||||
if (indexVersion.onOrAfter(DenseVectorFieldMapper.DEFAULT_TO_INT8)
|
if ((indexVersion.onOrAfter(DenseVectorFieldMapper.DEFAULT_TO_INT8)
|
||||||
|
|| indexVersion.onOrAfter(DenseVectorFieldMapper.DEFAULT_TO_BBQ))
|
||||||
&& indexed
|
&& indexed
|
||||||
&& elementType.equals(ElementType.FLOAT)
|
&& elementType.equals(ElementType.FLOAT)
|
||||||
&& indexOptionsSet == false) {
|
&& indexOptionsSet == false) {
|
||||||
b.startObject("index_options");
|
if (indexVersion.onOrAfter(DenseVectorFieldMapper.DEFAULT_TO_BBQ)
|
||||||
b.field("type", "int8_hnsw");
|
&& dims >= DenseVectorFieldMapper.BBQ_DIMS_DEFAULT_THRESHOLD) {
|
||||||
b.field("m", 16);
|
b.startObject("index_options");
|
||||||
b.field("ef_construction", 100);
|
b.field("type", "bbq_hnsw");
|
||||||
b.endObject();
|
b.field("m", 16);
|
||||||
|
b.field("ef_construction", 100);
|
||||||
|
b.startObject("rescore_vector");
|
||||||
|
b.field("oversample", DEFAULT_OVERSAMPLE);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
} else {
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("type", "int8_hnsw");
|
||||||
|
b.field("m", 16);
|
||||||
|
b.field("ef_construction", 100);
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if (indexed) {
|
if (indexed) {
|
||||||
b.field("similarity", elementType == ElementType.BIT ? "l2_norm" : "dot_product");
|
b.field("similarity", elementType == ElementType.BIT ? "l2_norm" : "dot_product");
|
||||||
|
@ -2038,15 +2054,24 @@ public class DenseVectorFieldMapperTests extends MapperTestCase {
|
||||||
public void testValidateOnBuild() {
|
public void testValidateOnBuild() {
|
||||||
final MapperBuilderContext context = MapperBuilderContext.root(false, false);
|
final MapperBuilderContext context = MapperBuilderContext.root(false, false);
|
||||||
|
|
||||||
|
int dimensions = randomIntBetween(64, 1024);
|
||||||
// Build a dense vector field mapper with float element type, which will trigger int8 HNSW index options
|
// Build a dense vector field mapper with float element type, which will trigger int8 HNSW index options
|
||||||
DenseVectorFieldMapper mapper = new DenseVectorFieldMapper.Builder("test", IndexVersion.current()).elementType(ElementType.FLOAT)
|
DenseVectorFieldMapper mapper = new DenseVectorFieldMapper.Builder("test", IndexVersion.current()).elementType(ElementType.FLOAT)
|
||||||
|
.dimensions(dimensions)
|
||||||
.build(context);
|
.build(context);
|
||||||
|
|
||||||
// Change the element type to byte, which is incompatible with int8 HNSW index options
|
// Change the element type to byte, which is incompatible with int8 HNSW index options
|
||||||
DenseVectorFieldMapper.Builder builder = (DenseVectorFieldMapper.Builder) mapper.getMergeBuilder();
|
DenseVectorFieldMapper.Builder builder = (DenseVectorFieldMapper.Builder) mapper.getMergeBuilder();
|
||||||
builder.elementType(ElementType.BYTE);
|
builder.elementType(ElementType.BYTE);
|
||||||
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> builder.build(context));
|
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> builder.build(context));
|
||||||
assertThat(e.getMessage(), containsString("[element_type] cannot be [byte] when using index type [int8_hnsw]"));
|
assertThat(
|
||||||
|
e.getMessage(),
|
||||||
|
containsString(
|
||||||
|
dimensions >= DenseVectorFieldMapper.BBQ_DIMS_DEFAULT_THRESHOLD
|
||||||
|
? "[element_type] cannot be [byte] when using index type [bbq_hnsw]"
|
||||||
|
: "[element_type] cannot be [byte] when using index type [int8_hnsw]"
|
||||||
|
)
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
private static float[] decodeDenseVector(IndexVersion indexVersion, BytesRef encodedVector) {
|
private static float[] decodeDenseVector(IndexVersion indexVersion, BytesRef encodedVector) {
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue